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SUMMARY 

The Department of Defense Polygraph Institute, founded In 
1986 under Department of Defense Directive S210.78, established 
a research division In January, 1987. This report describes the 
results of the Division's first three studies. 

Using a mock espionage paradigm, Experiment 1 estimated the 
accuracy of the four major counterintelligence screening tests 
used for aperiodic security screening of persons In special 
access programs. Two hundred seven Army personnel and civilian 
employees at Ft. McClellan were given aperiodic security 
screening tests by examiners from the Army INSCOM, the Air Force 
Office of Special Investigations, the National Security Agency, 
and the Central Intelligence Agency. Forty-four of the 207 
subjects were 'guilty* of committing acts of simulated espionage 
two months prior to the polygraph tests. Forty-seven subjects 
went through 'knowledge' scenarios in which they met someone who 
claimed to have committed espionage, and tried to recruit them to 
do likewise. Subjects in these two groups lied on the polygraph 
tests when they denied having committed espionage or knowing 
anybody who had. A third group of 116 subjects were 'innocent,' 
in that they were not programmed to be guilty or knowledgeable. 

The polygraph examiners were very accurate at clearing the 
programmed innocent persons. Excluding the 3X inconclusive 
results, 94% of the Innocent subjects were cleared. Moreover, 3 
of the 6 programmed Innocent subjects who were called deceptive 
admitted to having engaged in significant unreported real life 
security incidents. When those admissions were taken into 
account the false positive error rate was estimated to be about 
3X. 

There was a substantial problem in detecting the lies of the 
programmed-gui lty and programmed-knowledgeable subjects. 
Excluding the OX inconclusive results, 34X of the guilty and 
knowledgeable subjects were correctly identified as deceptive. 
The resulting false negative error rate of 66% was unexpected. 
Virtually no previous research has reported significant problems 
In detecting the lies of persons who have been programmed guilty 
or knowledgeable in mock crime situations. However, despite the 
poor identification of programmed guilty and knowledgeable 
subjects, approximately 20% of all subjects made admissions about 
real world security violations. 

There are a number of possibilities as to why so many of the 
programmed guilty subjects were cleared. The results of 
Experiment 1 may reflect a reliable estimate of the actual 
validity of security screening tests in the field, or they may 
indicate the effects of some variables unique to that experimental 
situation. Experiments 2 and 3 tested some of the experimental 
variables thought to be relevant to the results of Experiment 1. 
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Experiment 2 compared two testing strategies for when the 
examiner must cover several relevant issues within the same 
examination. Most previous mock crime experiments have used 
single issue examinations, while the examinations in Experiment 1 
addressed several relevant issues. Subjects in Experiment 2 were 
programmed to be guilty of <b , I , 2 , or 3 different acts of mock 
espionage or sabotage. Half of the subjects were tested with one 
triple issue test, and the other half were tested with three 
single issue tests. There was no difference in the accuracy of 
these two approaches to testing multiple issues. Excluding the 24% 
inconclusi ves , 79% of the innocent and 93% of the guilty subjects 
were correctly classified. However, neither approach was able to 
identify specifically which crime(s) the guilty subjects had 
committed. The high accuracy of the polygraph in identifying the 
guilty subjects in Experiment 2 was consistent with most previous 
research, and stands in contrast to the results of Experiment 1. 
Those results suggest that the false negative rate in Experiment 
1 was not caused by the use of multiple issue testing or by the 
use mock espionage scenarios per se. 

Experiment 3 examined two variables that differentiated 
Experiment 1 from most previous mock crime studies. First, the 
time interval between the commission of the mock crime and the 
polygraph was manipulated. In most previous mock crime 
experiments the polygraph examinations were administered 
immediately after the enactment of the mock crime. In Experiment 
1 approximately two months passed between the acts of mock 
espionage and the polygraph examinations. In Experiment 3 half 
of the subjects were tested immediately following their act of 
mock espionage and half were tested approximately 6 weeks later. 

A second variable examined in Experiment 3 concerned the 
specificity of the relevant questions. The relevant questions in 
Experiment 1 were worded In broad terms about generally defined 
acts of espionage and security violations. However, most 
studies have used very specific relevant questions. In 
Experiment 3 half of the subjects were asked specific relevant 
questions in a criminal investigative type examination and half 
were asked broad relevant questions in a screening type 
examination. The experiment was designed so that the question 
specificity factor was experimentally crossed with the time lag. 
That way the effects of the two variables could be examined both 
individually and in combination. Experiment 3 failed to find any 
effect for the manipulation of either question specificity or the 
experimental time lag. Excluding the 6% inconclusive outcomes, 
9(9% of the Innocent and SIX of the Guilty subjects were 
correctly classified. These results suggest that neither the 
time lag nor the use of general relevant questions can be used as 
an explanation the results of Experiment 1. 

One other variable discussed in the report, but not 
evaluated experimentally, was motivation. There was no explicit 
reward for subjects to produce truthful outcomes in any of these 
experiments. The performance of the polygraph examinations in 
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all three experiments was poorer than that reported in recent 
mock crime analog studies and a recent field study of federal 
examiners conducting specific issue forensic polygraph 
examinations. The issue of motivation in mock crime experiments 
needs additional research. An analysis of heart rate showed that 
the subjects in Experiment 1 were significantly less aroused than 
subjects undergoing actual screening tests in the field. It is 
also possible that the instructions given to the subjects in 
Experiment 1 could have contributed to the creation of some false 
negative errors. They were told that any admissions of real world 
security violations or criminal activity could harm their 
careers. That could have caused some of the programmed 
guilty/knowledgeable subjects to be more concerned about the 
control questions than about the relevant questions. If the 
false negative errors in Experiment 1 were caused by either low 
arousal or low motivation, then there should be fewer such errors 
in the field. 

We have not yet been able to identify what factor (s) caused 
the high false negative rate in Experiment 1. A number of 
hypotheses remain to be investigated. One possibility is that 
the examiners' expectations of a low base rate for deception may 
have increased false negative errors by influencing how they 
conducted the pretest interview. If that is the case then 
Experiment l's result's have important implications for the 
practice of detection of deception in the field. 

The results from Experiment 1 were compared to actual 
data from Department of Defense polygraph examinations. Agency 4 
results most closely mapped on to actual Department of Defense 
parameters. It appears likely that some field examiners have 
adjusted their testing procedures to accommodate the low base 
rate of deception situation they face. This adjustment is 
partially successful in that they make very few false positive 
errors. However, the analysis suggests that many deceptive 
individuals may be incorrectly cleared. 

It may be premature to estimate the accuracy of 
counterintelligence screening examinations based on the evidence 
presently available. However, this research suggests that there 
are far fewer false positive errors than previously predicted and 
that false negative errors may be more of a problem than 
previously believed. 

Research offers a number of possibilities for improving the 
accuracy of screening. High payoff areas for future research 
include standardization of testing techniques, data analysis, 
decision making, new physiological measures, and computerization 
of chart analysis. 
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GEMERAL INTRODUCTION 

Most research on the detection of deception has centered on 
the use of the control question test In criminal Investigations. 
This research has generally Indicated that the control question 
test is biased toward making false positive errors (see reviews 
by Kircher, Horowitz, & Raskin, 1988; Office of Technology 
Assessment [OTA] 1983; Raskin, 1988). That is, when the control 
question test errs, it tends to call innocent people deceptive. 
These reviews indicate that in high quality studies the control 
question test correctly classifies about 90 percent of the guilty 
and about 75% of the innocent subjects, although the range of 
validity estimates is large 1 . Considerable controversy 
continues in the literature regarding which studies constitute 
the correct data base for developing validity estimates. 
Additional controversy concerns the underlying rationale of the 
control question test. Some contend that the rationale of the 
control question test Is completely unreasonable, and that a high 
false positive rate is inevitable with that technique (e.g.. 
Lykken, 1981). 

Even less is known about the accuracy of the control 
question test or other detection of deception techniques in 
screening situations. Only two studies have been reported. 
Correa & Adams (1981) reported 100 percent accuracy at 
discriminating between truthful and deceptive volunteers In a 
mock screening situation (N * 40), However, their validity 
estimate dropped to as low as 68% when the examiner had to 
identify which of three questions the mock guilty persons were 
lying about. 

Barland (1981) analyzed a mock screening study designed and 
conducted by Steven Diduch of the 902nd MI Group. In this study. 
Military Intelligence examiners conducted a Counterintelligence 
Screening Test which utilized a directed lie control question 
test on 36 subjects, 30 of whom were instructed to lie to one of 
five items on a statement of personal history which they filled 
out for the experiment. The examiners cleared 76% of the 
programmed innocent subjects, and identified 81% of the 
programmed guilty subjects. As in Correa and Adams, the 
examiners were less accurate at identifying the precise question 
to which that the mock guilty subjects were lying. 

Since there is so little research on the accuracy of 
screening tests, many scientists have reasoned from the data on 
criminal investigative control question tests and have argued 
that screening tests should make many false positive errors 
(Lykken, 1981; Raskin. 1984, 1986a, 1986b). These authors also 



'For eniple, the eitiaatei of toe accuracy of coatrol question tests with guilty criiiiul suspects in tit OTA 
( 1083) review ranged iron 71. W to 98.61. Tot OTA ostiaatee estitate* (or accuracy with innocent suspects ranged 
froi 12.51 to 84.lt. 
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argue that in national security screening for spies this tendency 
to call innocent people deceptive will be greatly exacerbated by 
what is known as the base rate problem. 

The base rate problem in lie detection arises when only a few 
examinees are lying on the test. Consider the following 
hypothetical example. In a population of 1000 Individuals to be 
screened, assume that 10 are spies and 990 are innocent. 
Further, assume the highest estimates of validity, 95% accuracy 
with guilty subjects and 90% percent accuracy with innocent 
subjects. Given these assumptions, at least 9 of the 10 spies 
will be called deceptive, and S91 (90% of 990} of the Innocent 
will be correctly cleared. However, 99 (10% of 990) of the 
innocent will be incorrectly called deceptive. This means that 
the confidence in a truthful outcome will approach 100% (891 of 
892 truthful outcomes will be correct). However, the confidence 
in a deceptive outcome will be only 8% (only 9 of the 108 
deceptive outcomes will be correct) . and 92% of the deceptive 
outcomes will be false positive errors. This analysis, known as 
a conditional probability analysis, is correct as long as the 
underlying assumptions of base rate and test validity are 
correct . 

Unfortunately, many of the critics of security screening end 
their arguments with the conditional probability analysis, 
implicitly suggesting that the screening process ends with the 
result of the polygraph examination. This type of reasoning 
ignores two important points. First, there are & number of 
safeguards built into the security screening system. Persons 
producing deceptive results are re-examined In order to determine 
why they reacted. In the event that a deceptive outcome is not 
resolved through testing, other types of investigation are 
undertaken. The ultimate impact of these safeguards on the 
number of false positives is not considered in a simple 
conditional probability analysis. 

The second important point that is often ignored is the 
benefit gained by reducing the field of suspects. In the above 
hypothetical example the field of possible spies was reduced from 
1,000 potential spies to a field of 108 potential spies. A 
screening device that can reduce the field of suspects by an 
order of magnitude has considerable practical utility. 

There is some evidence that casts doubt on the predictions of 
a high false positive error rate in security screening 
examinations. The Department of Defense (DoD, 1986; 1987) 
reports to congress on the DoD polygraph program reported that of 
8,381 security screening examinations conducted during 1986 and 
1987, only 53 produced a deceptive outcome, and all but four of 
those deceptive outcomes were confirmed by admissions made by the 
subjects. Even if the remaining four cases were all false 
positive errors, the maximum possible false positive error rate 
would be less than 0.05%. However, the number of deceptive 
people who were incorrectly cleared (false negative errors) is 
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unknown. Thus, there is some evidence which suggests that the 
federal security screening polygraph program may be very good at 
avoiding false positive errors. These data from the Department 
of Defense suggest that the assumptions used by scientists in 
calculating their conditional probability analyses are either 
incomplete or in error. It may be that there are considerable 
validity differences in the application of criminal investigative 
and security screening polygraph examinations. 

The reason for this apparent discrepancy between criminal 
investigations and screening situations is unknown. Barland 
(19885 has suggested that if the examiner believes that the 
person to be tested is almost certainly truthful . this might bias 
the examiner, altering the way the examination is conducted and 
the way the charts are interpreted. If that is true, then there 
would be a tendency to make it easier to clear both the innocent 
and guilty persons. That is, false positive errors may be 
reduced at the expense of increasing false negative errors. 
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EXPERIMENT 1 

Introduction 

Federal examiners use four types of examinations for 
aperiodic security screening of employees who already have 
security clearances. Since little is known about the accuracy of 
any of these techniques as screening tests, and since such tests 
are very important to national security, we decided to study 
those tests. Unfortunately, field research in the detection of 
deception is difficult, since a criterion for ground truth (who 
really is guilty and innocent) must be developed independently of 
the polygraph examination. This is difficult to do in criminal 
situations, but is doubly difficult in security screening where 
the base rate of deception is very low. Therefore, we examined 
security screening in an analog screening situation where we 
could exert experimental control over ground truth. 

However, there are several problems with analog studies. 
The subjects of an analog study know they are participating in an 
experiment. This may create a number of differences between 
research subjects and real world examinees in terms of their 
reasons for taking the polygraph, the amount of stress they are 
under during the examinations, the type and extent of their 
emotional reactions, and how they react to questioning following 
the tests. For example, in an analog situation the programmed 
guilty subject answers the screening question, "Have you ever 
transmitted classified information to a representative of a 
foreign government without authorization?" with a *No." 
Technically, the programmed guilty subject is answering the 
screening question truthfully, for the confederate was not in 
fact representing a foreign government, and whatever information 
was passed was in fact authorized. This is a problem common to 
all mock-crime paradigms, but despite this, the detection rate 
for programmed guilty subjects in realistic analog mock crime 
studies is usually quite good. 

In addition, experimental subjects volunteer, whereas the 
screening examinees must take the polygraph tests aperiodi cal ly 
to retain their clearances. This self-selection in experiments 
may reduce the proportion of subjects seeking to conceal real- 
life information from the examiner, and may reduce the 
general izabi 1 ity of analog screening studies. However, a more 
serious problem for analog screening studies involves the 
programmed-innoeent subjects and real-life information. Because 
the test questions encompass activities beyond the scope of the 
research study, it could happen that programmed innocent subjects 
may attempt to conceal information from the examiner regarding an 
actual security Incident that occurred prior to the study. Thus, 
a deceptive outcome would be a false positive only in reference 
to the study scenario; it would be a true positive outcome in 
reality. Unless the subject volunteers information about real- 
life guilt, the outcome would be miscategor ised as an error 
(false positive). Ground truth is thus harder to ascertain in 
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analog screening studies than in analog criminal investigative 
studies, especially for the programmed- innocent group. 

Despite the problems associated with analog studies it was 
decided that they were the best way to obtain initial estimates 
of the validity of security screening polygraph examinations. In 
Experiment 1 we examined the validity of the four major federal 
security screening techniques with three criterion groups, 
Innocent, Guilty, and Knowledgeable. Innocent subjects were 
instructed to answer all test questions truthfully. Guilty 
subjects committed acts simulating espionage, and were instructed 
to deny having committed espionage during the polygraph 
examination. Knowledgeable subjects were exposed to an 
individual who claimed to be a spy, and were instructed to deny 
that knowledge during the polygraph examination. 

This study introduced several methodological changes in 
polygraph research paradigms, in the hope of improving the 
generali zabi 1 i ty of the analog paradigm. Most previous research 
has used only one examiner. That examiner was usually selected 
on the basis of availability to the experimenter rather than 
being randomly drawn from the target population. This study used 
Id federal polygraph examiners who conduct security screening 
examinations on a daily basis. Although they were not randomly 
selected, they were believed to be generally representative of 
the federal screening examiner population. 

Most previous analog studies have administered the polygraph 
tests minutes after the programmed guilty subjects have committed 
the mock crime. In another efzort to make this analog as 
realistic as possible, the polygraph te3ts were not administered 
until approximately two months after the mock crimes were 
committed. Furthermore, the crime scenarios were more complex 
than moat previous studies. They required several actions over a 
course of at least two days. 

Method 

Subjects 

The target population for generalization of this study was 
those government employees with Top Secret/Codeword clearances 
working in special access and other programs requiring that they 
be aper iodica 1 ly polygraphed for counterintelligence purposes. 
Unfortunately, such a population was not readily available at 
Fort McClellan, since very little Top Secret/Codeword material is 
handled on that post. The agencies participating in this study 
described the demographic characteristics of the target 
population, and an effort was made to select a group of civilian 
and military personnel at Ft. McClellan which would match the 
target population as closely as possible. 

During April and May of 1©87, volunteers were solicited 
during a series of small meetings (10 to 50 attendees) from among 
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the military and civilian personnel at Ft. McClellan. Alabama. 
During those meetings potential subjects were told the following: 
The Department of Defense wished to conduct a study on the 
accuracy of security screening polygraph tests. Volunteers would 
be asked to take a security screening polygraph examination to 
determine if they were security risks. Randomly selected 
volunteers would be asked to participate in a mock crime, which 
would require several hours of their time over several days. All 
volunteers would have to take a security screening polygraph test 
in August, 1987, which would take half a day. If the results 
were questionable, they would be asked to return for additional 
examinations. The tests would be conducted by federal polygraph 
examiners . 

In an effort to simulate the anxieties of the target 
population, the potential volunteers were informed that any 
admissions of criminal activity or violations of a national 
security nature would be reported to an adjudication panel. If 
the adjudicators felt that the admissions were significant, an 
investigation could be opened, and their careers possibly 
damaged. They were told that if they had anything in their 
background which they didn't want to reveal, they should not 
volunteer for this study. Two motives for volunteering were 
proffered: they would be helping their country, and they would 
gain firsthand experience about the polygraph. No money or 
reward was offered either for participation or for passing the 
examination . 

A total of 260 subjects volunteered for this study. Forty- 
six subjects (all military) were lost through reassignment prior 
to the polygraph examinations. An additional six subjects 
declined to participate in the guilt scenarios when they learned 
what was required. A total of 208 subjects were administered 
polygraph examinations. During the polygraph examinations, one 
of the subjects stated she was currently being treated for a 
relatively serious psychological problem that had begun after 
her initial briefing. Although the examiner continued the test, 
upon review the agency stated that its policy prohibits 
administering polygraph tests to persons with that type of 
disorder. Based on this information we removed this subject from 
the analysis. 

Of the 287 subjects whose data were retained for analysis. 
44 were programmed guilty, 47 were programmed knowledgeable, and 
116 were programmed innocent. Subjects' ages ranged from 20 to 
59 with a mean of 36. There were 149 males and 58 females. 
Sixty-six of the subjects were civilians, and 131 were military. 
Military ranks ranged from E4 to Colonel. The civilians were all 
employees of Fort McClellan. 

Most subjects were asked about the highest level of 
classified material to which they had ever had access. Of the 
181 subjects for whom data were available. 50 reported having 
worked with TOP SECRET material, 96 with SECRET, 8 with 
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CONFIDENTIAL, and 27 denied having ever had access to classified 
information. These data were not verified from official files. 

There is one clear difference between the subjects In this 
study and target population for generalization. Only a quarter 
of the subject population is believed to have handled TOP SECRET 
material, whereas all of the subjects currently in the special 
access programs have access to TOP SECRET. 

Polygraph Examiners 

Eighteen polygraph examiners and live quality control 
supervisors were provided by the participating agencies. Each 
agency provided five examiners (except Agency 1 and Agency 4, 
which sent four) and one supervisor (except Agency 4, which 
provided two). Selection of the examiners was left to the 
agencies. However, the examiners were required to be intimately 
familiar with the aperiodic testing technique employed by their 
agency, and their primary duty had to be conducting screening 
examinations. Although the examiners were not randomly selected, 
discussions with the agencies revealed no factor which appeared 
to introduce any systematic bias. The selection criteria used by 
the agencies generally included availability tno examiner was 
brought in from overseas for this study) and experience (no 
intern examiner was selected) . 

The ages of the 18 examiners ranged from 27 to 47, with a 
mean of 38. There were 15 males and 3 females. The examiners 
from three of the agencies had graduated from the Defense 
Polygraph Institute or its forerunner, and were certified 
examiners. Their years of experience ranged from 1 to 13, with a 
mean of 5. The number of screening examinations they had 
conducted ranged from 225 to 2,200, with a mean of 1,092. 

Apparatus 

All polygraph examinations were conducted at the Defense 
Polygraph Institute in small, plainly furnished rooms containing 
two chairs and a desk. The polygraphs were recessed into the 
desk, with the surfaces flush. One-way observation windows 
allowed the quality control supervisors to monitor each 
examination. Two video cameras were suspended from the celling 
of each examination room and the cameras were visible to the 
subject. One camera viewed the subject, and the other the 
polygraph chart. The view of the subject was recorded throughout 
the examination. While the polygraph charts were being obtained, 
the view of the polygraph charts was recorded split-screen on the 
same videotape as the subject. 

The examiners used standard field polygraphs that were 
typical of those used by their agency for security screening 
polygraph examinations. All of the polygraphs were manufactured 
either by Stoelting or Lafayette. As a minimum, each polygraph 
measured respiration, akin resistance, and relative blood 
pressure . 
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Procedure . 

Initial Handling. Following the briefing of the potential 
volunteers about the purpose of the study and the hazards 
involved, all who agreed to participate were assigned a subject 
identification number and were asked to fill out brief personal 
history questionnaires. The information from those 
questionnaires allowed the scenarios to be tailored to those 
subjects who would be programmed guilty or knowledgeable. The 
subjects also filled out questionnaires concerning their 
attitudes toward the polygraph and how accurate they believed the 
polygraph to be in a variety of situations. The subjects were 
also administered three tests by a psychologist. None of the 
results of these psychological tests are reported here. This 
initial handling session took about three hours. The 
psychological testing was discontinued for the final 35 subjects 
because a number of potential volunteers appeared to be deterred 
by the testing procedure and the amount of time involved. 
Appendix A contains copies of the following: Consent Form, 
Volunteer Affidavit, Personal Data Form, Polygraph Attitudes 
Questionnaire, Polygraph Accuracy Questionnaire, and Subject 
Debriefing Questionnaire. 

Scenarios. The subjects were randomly assigned to treatment 
conditions in a matrix of three guilt levels (innocent, guilty, 
and knowledgeable) and four test types. Once the number of 
subjects in a cell reached a predetermined ceiling, no more 
subjects were assigned to it. Uneven cell frequencies resulted 
in the final sample as the result of the attrition of subjects 
who were transferred from Ft. McClellan between Initial 
assignment to conditions and the the polygraph examinations. In 
addition, the last several subjects originally assigned to Agency 
3 were examined by Agency 4 because the testing took longer than 
anticipated, and the Agency 3 examiners were unable to remain at 
Fort McClellan. 

About a week after the initial handling session, the subjects 
assigned to the guilty and knowledgeable conditions were 
individually briefed about their assignment by a member of the 
research staff. At that time, they were given the time, date, 
location, and bona fides for their initial contact in the 
scenario. They were not given any details of the scenario, but 
were directed to obey to the letter all instructions from their 
contact. They were told that when it came time to take their 
polygraph test, they would have to lie on the test. Under no 
circumstances were they to admit to the examiner that they had 
participated in a scenario. They were further told that they 
were not to give out any details of the scenario, even if they 
had to take several examinations. The subjects were not given 
any countermeasure instruction. Two photographs were taken of 
each subject so that their scenario contact would be able to 
identify them. The briefing, which lasted about twenty minutes, 
was concluded by cautioning subjects not to discuss their guilt 
or knowledge role with anyone, not even their spouse. 
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One to three days following this briefing, the guilty and 
knowledgeable subjects started their scenarios. A total of ten 
scenarios (eight guilt and two knowledge) involving mock 
espionage had been prepared. Each seenario was tailored to the 
background of the Individual volunteers. For example, a black 
subject would not be scheduled for a contact in a bar with a 
predominantly white red-neck clientele. 

All scenarios were planned and executed by five Air Force 
and INSCOM case officers who played the role of hostile 
intelligence agents. The scenarios all involved acts of mock 
espionage such as photocopying, photographing, or taking mock 
classified documents. None of the scenarios involved sabotage or 
terrorism, even though most security test formats included those 
areas. The guilt scenarios required two or three contacts 
involving one or two of the case officers over a period of two or 
three days. Most of the scenarios were executed during the 
evenings and on weekends In order to avoid arousing the suspicion 
of the volunteers' coworkers. This procedure was designed to 
avoid problems of coworkers discovering that they were both 
participants in the study. 

Subjects assigned to the Innocent condition were not 
contacted again until approximately two months after the initial 
handling. During the second contact they were scheduled 
for their polygraph examination. 

Experimental Controls. One of the potential problems in 
this type of study was the possibility that the examiners might 
gain unfair advantages as the experiment progressed. For 
example, they might have learned details of the study such as the 
nature of the scenarios or the base rate of deception. We 
attempted to retard this process with the following procedures: 

CI) The examiners were not informed of the base rate for 
deception. Because they came from an environment where they 
believed the base rate for guilt to be very low, and because none 
had ever participated in a scientific study before. It Is likely 
that they underestimated the number of persons programmed guilty 
or knowledgeable. This was supported by a survey of the 
examiners made late in the study. 

(2) The subjects were instructed not to reveal any details of 
their scenarios to the examiners. 

(3) The examiners were instructed not to solicit any scenario 
details during post-test interviews. This procedure created a 
problem for general izabi 1 i ty . Normal procedure calls for the 
examiner to obtain all relevant details of any admissions, 
followed by additional testing to determine the accuracy and 
completeness of a subject's explanations. We addressed this 
problem in the following way. When a confession was imminent, 
the examiner asked the subject to make a written statement while 
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the examiner stepped out of the room. The subject then sealed 
his statement in an envelope, and the examiner resumed testing 
on the accuracy of the statement without knowing the details. 
This put the examiners at a slight disadvantage as compared to 
field testing. 

(4) The examiners were asked not to discuss their exams with 
other examiners. 

(5) A large number of scenarios were employed, so that even 
if the examiners were to learn the details of one scenario and 
were to exchange details with other examiners, it is unlikely 
that they would learn all the scenarios. 

Polygraph Examinations. Approximately two months after the 
scenarios had been enacted by the Guilty and Knowledgeable 
subjects, all subjects were administered a polygraph examination 
by an examiner from a federal agency. The examiners were 
instructed to follow their normal field procedures for security 
screening examinations. They were not to mention words such as 
"experiment," "study," or "scenario." They were instructed to 
work up the relevant and control questions exactly as they would 
in the field. They were free to run as many charts as they felt 
appropriate and to conduct re-examinations if they could not 
clear the subject on the first day. Each examiner conducted up 
to two exams per day. The only constraint upon the examiners was 
that they could not listen to the details of any confession. 
Confessions were handled with the procedures described above. 

Blind Evaluations 

Following the conclusion of all polygraph examinations, but 
before the results were released, the polygraph charts were 
submitted to quality control personnel from each of the agencies 
for independent blind evaluation. The independent evaluators 
made decisions on each relevant question and then gave a rating 
of each subject's overall truth and deception on an 11 point 
scale. Those ratings were converted to overall decisions of 
truthful, inconclusive and deceptive. Those overall decisions 
and the single question decisions were analyzed separately. 

Agency Evaluations 

Approximately one year following the conclusion of the 
polygraph examinations, evaluators from each of the agencies 
returned to Fort McClellan to evaluate the videotapes of the 
examinations to determine if the examinations conformed to each 
agencies' standard practice for the conduct of security screening 
examinations. Some of those evaluations were performed blind to 
the subjects' conditions, and some were performed with a knowledge 
of the subjects' conditions. No examiner who conducted 
examinations in the original data collection participated in the 
blind evaluations. Two questionnaires were used for the blind 
and non-blind evaluations and they are included as Appendix B . 
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Categorization of Test Outcomes The classification of test 
outcomes as correct or incorrect is more complex in analog 
screening studies than it is in investigations of mock crimes. 
In this study, the relevant questions were worded in general 
terms. A typical example would be, "Have you ever had 
unauthorized contact with an official or employee of a foreign 
government'". This generalized wording has an important 
implication for the classification of programmed innocent 
subjects, as they, or any subject, might have been deliberately 
concealing Information pertaining to knowledge or acts prior to 
this study. For example, a soldier who had been stationed 
overseas may have improperly revealed classified information to a 
foreign girl friend. Since all subjects were explicitly warned 
that admissions would be adjudicated and could damage their 
careers, the soldier would not want to reveal this security 
violation to the examiner. Since security screening tests are 
designed to detect significant security compromises, a deceptive 
outcome verified by an admission of a serious compromise could be 
considered correct, despite the fact that the subject was 
programmed to be Innocent. We felt that if the admissions of 
programmed innocent subject reached a threshold of seriousness, 
such subjects should not be considered false positive errors. 

The determination of the threshold of significance for 
subject admissions was necessarily arbitrary. In this study, the 
admissions made by programmed innocent subjects were screened for 
significance by a panel of three researchers. The threshold was 
not explicitly defined, but the following factors were 
considered: classification level of the compromised information, 
recency of the incident, and actual or potential damage to the 
national security. No admission made by programmed innocent 
subjects during the pretest interview, regardless of 
significance, was considered by the classification panel, since 
in those cases there was no clear intent to deceive the examiner. 
An example of a set of admissions that was considered significant 
was as follows: Following a deceptive outcome a subject admitted 
that he had discussed classified codeword information in 
environments where it was likely that the information had been 
compromised, he had taken classified material home, and that 
he had classified material at his home at the time of the test. 
There were seven programmed innocent subjects who produced 
deceptive outcomes. Of those seven, four made significant post- 
test admissions. Those four subjects are reported separately 
when appropriate. 

Another classification requiring explanation regards 
programmed Guilty/Knowledgeable subjects who confessed their 
scenario involvement and were given a final polygraph examination 
to determine whether they had told the complete truth. Even 
though the final examination may have resulted in a decision of 
truthfulness, the examination was classified as having a correct 
outcome, since the programmed deception had been detected. Thus, 
for the original examiners in Experiment 1, we categorized 
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the accuracy of the system (the examiners' ultimate decision 
using all available information) rather than the more 
conventional blind evaluation of only the polygraph charts. 

There was one case where, following an inconclusive call on 
the first series, the subject confessed his scenario involvement 
on being asked if anything was troubling him on any of the 
questions. This was classified as a correct outcome rather than 
an inconclusive, since the subject's role was correctly 
identified as a result of the examination procedure. 

The classification rules were as follows: 



S Programmed 


First test 
Outcome 


Admission 


Retest 
Outcome 


Class! fled 


Innocent 


NDI 










Correct (True Negative) 


Innocent 


inc 


or 


DI 


No 


NDI 


Correct (True Negative) 


Innocent 


Inc 


or 


DI 


No 


Inc 


Inconclusive 


Innocent 


Inc 


or 


DI 


No 


DI 


Incorrect (False Positive) 


Innocent 


Inc 


or 


DI 


Yes* 


NDI 


Correct (True Negative) 


Innocent 


Inc 


or 


DI 


Yes* 


Inc/DI 


Correct but Deceptive 


Gu or K 


WD I 










Incorrect (False Negative) 


Gu or K 


Inc 


or 


DI 


No 


NDI 


Incorrect (False Negative) 


Qu or K 


Inc 


or 


DI 


No 


Inc 


Inconclusive 


3u or K 


Inc 


or 


DI 


No 


DI 


Correct (True Positive) 


Qu or X 


Inc 


or 


DI 


Yes 


NDI 


Correct (True Positive) 


Qu or K 


Inc 


or 


DI 


Yes 


Ine/DI 


Correct (True Positive) 



Gu = Guilty X = Knowledgeable Inc = Inconclusive DI = Deception Indicated 
NDI = No Deception Indicated 

•Programmed innocent subjects admitting to deliberately holding back 
relevant information about a significant real world incident. 
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Results 2 

At ail levels of the analysis statistical tests were used to 
look for differences between subjects programmed Guilty and those 
programmed Knowledgeable. Only one of those analyses produced a 
significant result. Therefore, for purposes ol simplicity the 
subsequent sections will treat the Knowledgeable and Guilty 
manipulations as one Guilty/Knowledgeable condition, unless 
otherwise noted. 

Original Examiners' Classification Decisions. 

Al 1 Agencl es . The overall performance of the original 
examiners is shown in Table 1 and rate summaries are provided in 
Table 2. The determination of rates with Innocent subjects is 
somewhat difficult. Seven programmed Innocent subjects were 
reported as deceptive, but four of those seven subjects made 
significant admissions to real world security violations during 
their post-test interviews. Those four subjects were not used in 
calculating the accuracy rates reported in this and subsequent 
sections . 

With Innocent subjects, the original examiners' 
classifications were 93% correct, 4% inconclusive, and 3% 
incorrect. Excluding inoonclusi ves the classification of 
Innocent subjects was significant^, 97% of the Innocent subjects 
were classified correctly, and 3% were false positive errors, z = 
-9.30, p. < 0.001. Of the Guilty/Knowledgeable subjects, 31% were 
correctly identified as deceptive by the original examiners, 9% 
were reported as inconclusive, and 60% were incorrectly cleared. 
Excluding inoonclusi ves , the classification of 
Guilty/Knowledgeable subjects were 34% correct, and 66% were 
false negative errors, z * -2.85, p_ < 0.01. Since the outcome 
falls in the opposite direction from the prediction of the 
appropriate alternative hypothesis for this one tailed test, this 
result indicates that the classification of Guilty/Knowledgeable 
subjects was not different from chance. However, in a practical 
sense, this finding means that the classification of 
Qui 1 ty 'Knowledgeable subjects was significantly worse than 
chance . ' 



^Tao broad types of statistics (hypothesis tests and lafnitude ol effect statistics) are reported in tail and 
the subsequent results sections. Bypotbtsis tests (Cbi Square. F, t, Binoaiai, and z, in tliis report) evaluate 
the outcoMt of an experiment against a fixed criterion of chance. In the behavioral sciences that criterion Is 
usually set at a probability of 1.15. That Is, if the results of an experlatnt are likely to occur only 5 tints 
(or lets) out of 111 replications by chance, they are accepted as likely to have been caused by the independent 
variables of the ezperlmrt, rather than to have occurred by sampling error. It Is Inappropriate, and nay be very 
■isleadinj to Interpret the probability levels of a hypothesis test as Indicating the size of the effect of the 
independent variable. a hypothesis test result that Is very unlikely by chance, J < I. 111. any be a nailer 
effect than a test tith a calculated probability value that Is lucb lore likely by chance, p_ < US, depending on 
the statistics used, and the saxple size. Do not use reported probability values to evaluate the size of the 

effects of the variables ttitcd. The Me of the effect ol i Klitlraiiip between in variables U appropriately 
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Table l 4 . Classification results of the 19 original examiners 
from all agencies. 



Innocent 



Guilty 



NDI 




I nc 




DI 


105 


i 
1 


4 




3 + (4) 


55 


t 

i 

P 
( 


e 




28 



Totals 



116 



91 



Totals 



160 



1 2 



35 



207 



The predictive relationship illustrated in Table 1 was 
evaluated in several ways. First, a Chi Square (X 2 ) analysis was 
conducted on the Innocent and Guilty/Knowledgeable by Decision 
(Truthful, Inconclusive, and Deceptive) contingency table shown 
as Table 1, and the resulting X 2 was significant, X 2 (2) = 
35.33, p_ < 0.0001. This result indicates that the decisions on 
Innocent and Guilty/Knowledgeable subjects were not randomly 
distributed across the cells of the contingency table. 

Then, the tau c (Norusis, 1986) statistic was used to measure 
the magnitude of the predictive relationship of the original 
examiners' decisions for the Innocent, Guilty/Knowledgeable 
criteria. Tau c is a nonparametr ic measure of association that 
can range from -1.0 to ♦ 1 . and can be interpreted in the sane 
manner as a correlation coefficient (Siegel , 1056). We used tau 
c as an index of predictive performance, and ae a statistic for 
comparison of discrimination performance between agencies. 



evaluated with aegaitode of effect statistics (r and tau C la til* report), These statistics result in * 
coefficient value that can vary between -1.1 and +1.1. a value of M.I indicate* a perfect direct relationship, 
if a value on one variable ia snown the valve on the other variable it alto know. A value of -1.1 alio Indicate* 
a perfect relationship, but an Inverse one. In a perfect inverse relationship, as dm variable (rows larger the 
other grots nailer at the taae rate. Jk value ol M Indicates bo relationship betseen the variables. If r is 
squared the resulting value represents the aseunt oi variance the tea variables share in comma Probabilities 
associated with r and tau C. represent the likelihood that those values could have been obtained by chance 
sampling. 

3 fiinoiiaI and z tests conducted on the classifications of Innocent and Guilty /Knowledgeable conditions were all 
conducted i -tailed. Ill other statistical tests ware conducted Mailed. 

4 The values sbown in Table 1 In parentheses aad boldface, K), represeat those progressed innocent subjects who 
sade significant post-test adiission tbat were presused to account for their being diagnosed as deceptive. 
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Table 2. Sunaary statistic* of tb« p«rfoPB4n« of the original exaaln«Pi. 

Ag«ney 

12 3 4 ALL 



IMCOHCLUSIVE 
SITES FOB: 
IHOCOff 
KI0MJD6E 
GUJLTT 
G ♦ I 

IICOBEECT 



57 

(1/57) 41 



32 

(4/32) 81 



fl/35) 31 (|/28) 92 

(1/13) 81 (1/12) 8: 

(9/19) 81 (3/12) 25! 

(1/22) 51 (4/24) 172 

(11/37) 181 (12/52) 232 
UTES (QCLDD1H0 IICOICLU3I9ZS) Kffl: 

IOOCMT («/33) 8X (1/26) 4i 

GOIITT (T/l#) 792 (4/0) 442 

KIW (4/11) «4l (7/11) 641 

3 ♦ K (11/21) 521 (11/21) 5531 

COHICT (43/37) 731 (34/32) 831 

um mamm itcotcwms) ?o* 

1U0CDT (33/33) UK (23/26) 662 

GUILTT (3/11) 311 (379) 561 

KMREDO! (7/11) 64X (4/11) 3SX 

G ♦ X (19/21) 482 (9/21) 43X 
COIFIDEHCE I! 0DTC0ICS: 

M (19/11) 199X 19/19) 911 

1W (33/44) 732 (25/36) 692 



46 

(5/46) 112 
(3/26) 122 

oni) in 

(1/19) 192 
(2/29) 192 
(13/46J 282 

(2/22) 91 

(7/9) 782 

(4/9) 442 

(11/18) 612 

(27/46) 392 

(29/22) 912 

(2/9) 222 

(5/9) 582 

(7/18) 392 

(6/9) 6T2 
(29/31) 632 



52 

(1/32) 22 

(9/27) 92 
(9/13) 92 

(1/12) 82 
(1/25) 42 
(22/52) 422 



217 

(12/297) 62 

(4/116) 32 
(3/47) 62 
(S/44) 112 
(8/91) 92 
(58/297) 282 



(1/27) 92 (3/198) 32 

(19/U) 912 (28/39) 722 

(12/13) 922 (27/44) 612 

(22/24) 922 (35/83) 661 

(29/52) 362 (133/297) 641 



(27/27) 1992 

(1/11) 92 

(1/13) 91 

(2/24) 82 

(2/2) 1991 
(27/49) 551 



(195/198) 972 
(11/39) 292 
(17/44) 392 
(28/83) 342 

(27/31) 872 
(195/169) 662 



imn (( 1992*482) /2) ( (962+451) /2) 
ICDI ACCUASCt 741 791 



( (912*391) m ((1H2+821/2) 
MI JU 



l(97X>J42)/2) 
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The tau c for all agencies combined was significant, tau c ■ 
0.34, p_ < 0.0001, but was of a modest magnitude. In comparison, 
the tau c derived from a recent mock crime study (Kircher & 
Raskin, 1988) was 0.87. Similarly, the original examiners in a 
field study of specific issue forensic polygraph examinations 
conducted by the United States Secret Service produced a tau c of 
0.76 (Honts. Raskin, Kircher, & Horowitz, 1988). 

Finally, effects in classification performance of the 
combined agencies were tested with a series of Kruskal - Wal 1 is 
nonparametrlc analyses of variance (ANOVA) . A significant 
difference in classification of Innocent and Guilty/Knowledgeable 
subjects was indicated by the first Kruskal -Wal 1 is ANOVA, X 2 CD 
= 17.50, £ < 0.0001. This effect indicates that Innocent and 
Guilty/Knowledgeable subjects were classified differently. A 
second Kruskal -Wa 1 1 i s ANOVA indicated that there were significant 
differences in classification performance between the agencies, 
X 2 = 5.61 , p_ < 0-05. 

An examination of the performance of the individual agencies 
suggested that there might be an interaction of Agency and 
Condition in the decision data. Since there is no nonparametrlc 
statistical test for interaction effects, a parametric Condition 
(Guilty/Knowledgeable, Innocent) by Agency (CIA, MI, NSA. OSI) 
AKOVA was conducted to investigate the possibility of 
interaction. This may violate the assumption of interval scale 
measurement of a parametric ANOVA. However, Kircher, Horowitz, 
and Raskin (1688) have argued that the decisions NDI , 
Inconclusive, DI represent an interval scale, and have used 
parametric statistics on such data. For purposes of this 
analysis and when necessary in some additional analyses, we also 
treated those decisions as an interval scale. ANOVA produced 
results that were similar to the Kruskal -Wal 1 is ANOVAs , with 
significant main effects for Condition, F (1, 195) = 47.65, £ < 
0-001. and Agency, F (3, 195) = 4.85, p_ < 0-01- ANOVA also 
indicated a significant interaction of Condition and Agency. F 
(3,195), £ < 0.05. That interaction appears to be primarily due 
to the very poor performance of Agency 4 with Guilty/Knowledgeable 
subjects . 

Agency 1_. The classification results of the original 
examiners of Agency 1 are shown in Table 3, and they are 
summarized in Table 2. With Innocent subjects, the Agency 1 
examiners' classl f ications were 97X correct, 3Y. inconclusive, and 
0% incorrect, z - -5.31, £ < 0.001. With Guilty/Knowledgeable 
subjects, the Agency 1 examiner's decisions were 46% correct. 4% 
inconclusive, and 50% incorrect. Excluding inconclusi ves the 
classifications of Guilty/Knowledgeable subjects were not 
different from chance, 48tC of those decisions were correct, and 
52X were false negative errors, Binomial, £ < 0.50, ns . The X 2 
for the Agency 1 decision table was significant. X 2 (2) = 19.32. p_ 
< 0.001, as was the tau c. tau c = 0.46. £ < 0.001. 
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Table 3. Classification results of the original examiners from 
Agency 1. 

NDI Inc DI Totals 



/ \ 

Innocent i 33 : 1 : (1) : 35 

; + + ; 

Guilty : U : 1 10 : 22 

s / 

Totals 44 1 1 1 57 



Agency 2. The classif ication results of the original 
examiners of the Agency 2 are shown in Table 4, and they are 
summarized in Table 2. With Innocent subjects, the Agency 2 
examiners' classifications were 96% correct, <&% inconclusive, and 
4X incorrect, z = -4.53, £ < 0-001. With Qui 1 ty/Knowledgeable 
subjects, the Agency 2 examiners' decisions were 37% correct. 177. 



Table 4. Classification results of the original examiners from 
Agency 2. 

NDI Inc DI Totals 



/ \ 

Innocent ! 25 i : 1 ♦ (2) .' 28 
; + + . 

Guilty ! U 4 S 9 1 24 

\ --- / 

Totals 36 4 12 52 



Inconclusive, and 46% incorrect. Excluding inconcl us ives the 
classifications of Guilty/Knowledgeable subjects were not 
different from chance, 45% of those decisions were correct, and 
552 were false negative errors. Binomial, j> = 0.25, ns. The X 2 
for the Agency 2 decision table was significant, X 2 {2) = 15.79 p_ 

< 0.001, as was the tau c, tau c = 0.50, p_ ( 0.001. 
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Agency 3. The classification results of the original 
examiners of Agency 3 are shown in Table 5, and the rate data 
are summarized in Table 2. With Innocent subjects, the Agency 3 
examiners' classifications were 77% correct, 11% inconclusive, 
and 12% incorrect. Excluding inconcluei ves the classification 
rates for Innocent subjects were significant, 87% of the 
decisions with Innocent subjects were correct and 13% were false 
positive errors. Binomial p_ < 0.001. With Guilty/Knowledgeable 
subjects, the Agency 3 examiners' decisions were 35V. correct, 10% 
inconclusive, and 55% incorrect. Excluding inconclus ives , the 
classification of Qui 1 ty/Knowledgeabl e subjects was not different 
from chance: 39% of those decisions were correct, and 61% were 
false negative errors, Binomial, £ = 0.24, ns . The for the 
Agency 3 decision table was not significant, but the the tau c 
was significant although modest, tau c = 0.28, £ < 0.05. 



Table 5. Classification results of the original examlnere 
from Agency 3. 



NDI Inc DI Totals 



/ _ __ s 

Innocent I 20 I 3 ! 2 ♦ (I) : 26 

> + + ; 

Guilty ! 11 ! 2 : 7 ! 20 

\ ) 

Totals 31 5 10 46 



Agency 4. The class i f leation results of the original 
examiners of Agency 4 are illustrated in Table 6 and they are 
summarized in Table 2. With Innocent subjects, the Agency 4 
examiners' classifications were 100% correct, 0% inconclusive, 
and 0% incorrect, z * -5.00, p_ < 001. With Guilty/Knowledgeable 
subjects, the Agency 4 examiners' decisions were 8% correct, 4% 
inconclusive, and 88% incorrect. Excluding inconclusives , 8% of 
the decisions with Guilty/Knowledgeable subjects were correct, 
and 92% were false negative errors. Binomial, p. < 0.001. This 
result could be Interpreted to indicate that the classification 
of Guilty/Knowledgeable subjects was "significantly worse than 
chance." X 2 was not calculated for the Agency 4 decision table 
since the expected frequencies were so low. The tau c, although 
very modest, was significant, tau c = 0.12. p_ < 0.05. 
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Table 6. Classification results of the original examiners from 
Agency 4 . 

NDI Inc DI Totals 



/ \ 

Innocent : 27 : a i ; 27 

! + + ; 

Guilty ! 22 : 1 i 2 I 25 

v, / 

Tctals 49 1 2 52 



Original Examiners ' Confidence In Outcomes. 

Analysis of Variance (ANOVA) was used to examine differences 
in the ratings examiners gave on the confidence scales. Those 
ratings were subjected to a subject Gender (male/f emale) X 
Condition (Innocent, Knowledgeable, Guilty) X Agency ANOVA. None 
of the second or third order interactions were significant. The 
main effect for Gender was significant, F (1, 184) = 4.58, p. < 
0.05, indicating that the examiners were more confident in their 
decisions on females (M = 4.27) than on males (M = 3.85). 
However, a Kruskal - Wal 1 is ANOVA and a parametric ANOVA failed to 
find any difference in decision rates between genders, nor did 
gender interact with guilt condition in the decisions. 

Examiners were more confident in their decisions on Innocent 
(M = 4.12) and Knowledgeable subjects (M = 4.04) than they were in 
their decisions on Guilty subjects (M = 3.50) as was indicated by 
a significant main effect for Condition, F (2. 184), = 6.03, £ < 
0.05. This finding suggests that examiner confidence in an 
outcome has little relationship to the accuracy of the outcome, 
since examiners were very accurate with Innocent subjects, but 
not very accurate with Knowledgeable or Guilty subjects. That 
hypothesis was explored by coding outcomes as Correct, 
Inconclusive, and Incorrect, and then correlating those codes 
with confidence in outcome. The resulting correlation was 
significant, r - -0.12, , p_ < 0.05, but indicated that confidence 
and outcome share only 1.4* common variance. Agency 2 examiners 
were more confident in their decisions (M = 4.46) than were 
Agency 3 <M = 3.70), Agency 1 (M = 3.95). or Agency 4 <M = 3.75) 
examiners as was indicated by a significant main effect for 
Agency, F (3, 184) = 4.20, £ < 0.01. 
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Real World Admissions by Subjects. 

The number and seriousness of real world admissions ?iven by 
subjects during their examinations is illustrated by agency m 
Table 7. A Kruskal -Wa 1 1 is ANOVA indicated that there was a 
significant difference between the agencies in the number of real 
world admissions they obtained. X 2 CI) = 17.89, d < 0.001. 
Agency 3 obtained the most real world admissions and Agency 4 the 
least. However, this is not at all surprising. .Most of the real 
world admissions obtained were of security violations. Agency 4 
policy is that they are chartered to detect espionage and 
sabotage, and that they are not chartered to search for and or 
report security violations. Since Agency 4 examiners neither 
look for nor report security violations we should not expect them 
to obtain many admissions to security violations. 



Table 7. 


Percent real 


world admissions by 


severity and 


agency , 






Admission Seventy 




Agency 


None 


Petty 


Mi nor 


Moderate 


Signi f leant 


Agency 1 


81 


12 


5 


2 





Agency 2 


90 


4 





4 


2 


Agency 3 


68 


15 


1 1 


4 


2 


Agency 4 


96 


2 


2 









Real world admissions were also examined in terms of whether 
they were obtained from subjects who ware programmed Innocent, 
Knowledgeable, or Guilty. A Kruskal -Wal 1 is ANOVA found no 
significant difference between the conditions. 

Blind Evaluations 

The confidence ratings o: overall truthfulness made by the 
independent evaluators two months after the examinations are 
shown in Table 8. Those ratings were subjected to a series of a 
priori contrasts to see if the ratings for Innocent subjects were 
different from those of the Knowledgeable and the Guilty groups. 
The a priori contrasts indicated that the ratings for the 
Innocent group were different from those of the Knowledgeable 
group, t (198) = 2.7, p_ < 0.01. and from the combined 
Knowledgeable and Guilty groups, t (198) = 2.83. p < 0.01. but 
the ratings of the Innocent group were not different from the 
Guilty group. 
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Table 8. Mean Ratings of Truth and Deception by the Independent 
Eval uators . 

Condition Mean Rating (-5 = Deceptive, +5 = Truthful) 

Innocent 1.43 

Knowledgeable 0.04 

Guilty 0.49 

Guilty/Knowledgeable 0.26 



A decision analysis of the overall truthfulness ratings was 
conducted by converting the confidence ratings to decisions with 
varying inconclusive zones. Initially, ratings greater than 
were considered truthful, ratings less than i> were considered 
deceptive, and ratings were considered Inconclusive. The 
inconclusive zone was varied by one rating point in each 
direction until the inconclusive zone was -4 to *4 inclusive. 
The predictive validity of the decisions made this way was 
significant and peaked at a tau c value of 0.24, p_ < 0.05, when 
the inconclusive zone was 0. A decision table was created using 
this inconclusive zone for all examinations and all agencies 
and is presented as Table 9. Of the individual agencies only 
Agency 1 produced a significant discrimination of Innocent and 
Guilty/Knowledgeable subjects. 



Tabic 9. Percent Decisions Based on Blind ^valuators' Mints 



Agency 


Correct 
Innocent 


Inc. 

Innocent 


Incorrect 
Innocent 


Correct 
Guilty 


lac. 
Guilty 


Incorrect 
Guilty 


c 


Agency I 


71 


i2 


1? 


64 


9 


27 


1 47* 


Agency 2 


36 


26 


36 


64 


9 


27 


6.23 


Agency 3 


71 


4 


26 


43 


14 


43 


6.25 


Agency 4 


93 


7 


f 


4 


16 


8f 


#.13 


Cottoned 


87 


13 


21 


42 


12 


46 


1.24' 



' l < 1.15 
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Accuracy on Single Relevant Questions. 

The original examiners' and the blind evaluators' decisions 
on single relevant questions were also evaluated. We have 
decisions from the original examiners on 1194 relevant questions. 
The accuracy of the original examiners on single relevant 
questions is shown in Table 10. With truthfully answered 
relevant questions, the independent evaluators were correct 88% 
of the time, incorrect 2% of the time, and called 18% 
inconclusive. Excluding i nconc 1 us i ves , 97% of the calls on 
truthfully answered questions were correct and 3% were false 



Table 10. Classification of single relevant question by the 
original examiners from all of the agencies. 



NDI Inc DI Totals 

/ x 

Innocent ! 766 I 82 ,' 21 : 869 

! j. + . 

Guilty ! 206 : 54 ! 65 ! 325 

\ / 

Totals 972 136 86 1194 



positive errors, z * -27.0, p_ < 0.001. When questions were 
answered deceptively, the independent evaluators were correct 20% 
of the time, incorrect 63% of the time, and called 17% 
inconclusive. Excluding inconclusives , 24% of the deceptively 
answered questions were correctly classified and 76% were false 
negative errors, z = -8.51, , p_ < 0.001. Again, this could be 
interpreted as performance that was 'significantly below chance.' 
The discrimination between truthfully and deceptively answered 
relevant questions was significant, tau c = 0.21, £ < 0.001, but 
was modest in magnitude. A similar analysis was performed for 
each agency and is summarized in Table 11. 
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Table 11. Percent Decisions On Single Be levant Questions by the Original Exaaineri 





Correct 
Truthful 


Inc. 
Truthful 


Incorrect 
Truthful 


Correct 
Deceptive 


Inc. 
Deceptive 


Incorrect 
Deceptive 


t&u 
c 


Agency 1 


86 


11 


3 


18 


24 


58 


1.32" 


Agency 2 


«« 


7 


3 


24 


13 


63 


*.22" 


Agency 3 


T9 


19 


2 


27 


18 


55 


-«.26" 


igency 4 


85 


4 


1 


4 


12 


84 


(.18* 


C cabined 


88 


9 


3 


2« 


17 


63 


».2l" 


* i ( ».»5 


"E< 


Ml 













The independent evaluators' decisions were also evaluated at 
the level of accuracy on single relevant questions. We have 
evaluations for the blind evaluators for 1318 relevant questions. 
The accuracy of the blind evaluators on single relevant questions 
is shown in Table 12. With truthfully answered relevant 



Table 12. Classification of single relevant question by the 



Independent evaluators from all of the agencies. 

NDI Inc DI 

Totals 

/ v 

Truthful ! 652 231 1 72 : 955 

; ± f . 

Deceptive ; 164 : 150 I 49 : 363 

\ / 

Totals 816 381 121 1318 



questions, the independent evaluators were correct 68% of the 
time, incorrect 8% of the time, and called 24*/. inconclusive. 
Excluding inconclusi ves . 91% of the calls on truthfully answered 
questions were correct and 9% were false positive errors, z = - 
21.52, p_ < 0.001. When questions were answered deceptively, the 
Independent evaluators were correct 147. of the time, incorrect 
45% of the time, and called 41% inconclusive. Excluding 
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inconclusives , 23X of the deceptively answered questions were 
correctly classified and 77% were false negative errors, z = 
-7.81, p_ < 0.001. The discrimination between Innocent and 
Guilty subjects was significant, tau c * 0.19, p_ < 0.05. A 
similar analysis was performed for each of the agencies and is 
summarized in Table 13. With single questions only Agency 3 
failed to discriminate at a significant level. Agency 1 produced 
the most accurate decisions on single relevant questions. 



Table 13. Percent Decisions On Single Relevant Questions by the Independent Evaluators 



Agency 


Correct 


Inc. 


Incorrect 


Correct 


Inc. 


Incorrect 


tau 




Innocent 


Innocent 


Innocent 


Guilty 


Guilty 


Guilty 


c 


Agency 1 


74 


24 


2 


it 


56 


34 


».33* 


Agency 2 


62 


IB 


If 


IS 


4* 


42 


I.I5* 


Agency 3 


52 


J3 


15 


16 


4( 


44 


0,66 


Agency 4 


82 


7 


1 


fl 


3* 


64 


1.21' 


Coobined 


68 


24 


8 


14 


41 


45 


1.19* 


* E < 

















Agency Evaluations of the Examination Materials 

Bl ind Evaluations . Evaluators from the agencies reviewed the 
case materials 1 year after the conclusion of the experiment to 
determine If the examinations were conducted according to their 
agency's standards. The majority of their responses to the items 
In the Blind Evaluation Questionnaire (Appendix B) fell around 
the 'about the same' response. An interesting finding was that 
the blind evaluators seemed to feel that the examinations they 
viewed would be more accurate if the subjects were Innocent (M = 
5.4), than if they were Guilty/Knowledgeable (M = 4.6). t <43> = 
3.30, p_ < 0.01. ANOVA was used to test for differences between 
agencies in their responses to items. The mean item ratings 
where ANOVA indicated significant differences between agencies 
are summarized in Table 14. and the mean item ratings where there 
were no differences between agencies are summarized In Table IS 
(Due to a lack of available personnel, Agency 2 did not 
participate in the blind evaluations). 

Won-Bl ind Evaluations . Generally, the ratings for the 
questionnaires were similar to the blind evaluations and fell on 
or around the 'about the same' or 'Just like the field' choices 
of the scales. That is, the behavior of the original examiners 
was generally evaluated as not different from standard field 
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practice. The ratings by the non-blind evaluators are summarized 
for significant differences between agencies in Table 16 and for 
no significant differences between agencies in Table 17. 



Table 14. Mean Ratings of the Blind Evaluators 

With Significant Differences Between Agencies. 



Agency 1 

(N) 



Agency 3 
IN) 



Agency 4 F 
(N) (df) 



PRETEST LIKE 
THE FIELD? 

(7=JUST LIKE IT) 



3.9 
(14) 



5.5 
(6) 



4.9 
(24) 



7.40, £ < 
(2, 41) 



0.01 



EXAMINER'S 
DESCRIPTION 

OF POLYGRAPH 3.6 3.6 4.8 7.90. £ < 8-«l 

(14) (6) (24) (2, 41) 

(4*AB0(JT THE SAME) 



ADMONITIONS 

ABOUT MOVEMENT 4.5 4.3 3.7 7.90, g. < 0.01 

(14) (6) (24) (2. 41) 

(4=SAME AS FIELD) 



SIMILAR PRESENTATION 

OF RELEVANT? 3.9 4.7 5.5 13.40. £ < 9. 

(14) (6) (24) (2, 41) 

(7=JUST LIKE THE FIELD) 



SIMILAR PRESENTATION 

OF CONTROL? 4.0 5.3 5.3 6.90. g. < 0.1 

(14) (6> (24) (2. 41) 

(7«JUST LIKE FIELD) 



ALL £ < 4-05 
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Table 15. Mean Ratings of the Blind Evaluators 

No Significant Differences Between Agencies. 

Agency 1 Agency 3 Agency 4 

LENGTH OF PRETEST 4.3 4.2 4.3 

(14) (6) (24) 

(4=AB0UT THE SAME AS FIELD) 



EMPHASIS ON RELEVANT 

TYPICAL? 4. U 4.5 4.1 

(14) (6) (24) 

(4=ABOUT THE SAME AS FIELD) 



EMPHASIS ON CONTROL 

TYPICAL"? 3.9 3.8 4.0 

(14) (6) (24) 

(4=ABOUT THE SAME AS FIELD) 



IF GUILTY, PRODUCE 

ACCURATE OUTCOME? 4.9 4.8 4.4 

(14) (6) (24) 

(7= VERY ACCURATE ) 



IF INNOCENT. PRODUCE 

ACCURATE OUTCOME? 5.4 4.8 5.5 

(14) (6) (24) 

(7=VERY ACCURATE) 



EMPHASIS ON RELEVANTS 

BETWEEN CHARTS 4.7 3.3 4.0 

(3) (3) (4) 

(4 -ABOUT THE SAME) 



EMPHASIS ON CONTROLS 

BETWEEN CHARTS 4.0 3.7 4.0 

(3) (3) (4) 

(4'ABOUT THE SAME) 



ALL £ > 0.05. ns 

(N) 
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Table 16. Mean Ratings of the Non-Blind Evaluators 

With Significant Differences Between Agencies. 

AGENCY 



1 



2 

(N) 



3 

(N) 



4 

(N) 



PRETEST LIKE THE 
FIELD? 

(7=JUST LIXE THE FIELD) 



5.7 6.5 4.8 5.4 4.30, £ < 0.01 

(9) US) (15) (12) (3, 47) 



DESCRIPTION OF POLYGRAPH 

TYPICAL? 4.4 6.0 3.8 4.0 7.20, £ < 

(9) (15) (15) (12) (3, 47) 

(7=M0RE THAN THE FIELD) 



PRESENTATION OF RELEVANT 

THE SAME? 5.9 6.6 5.0 5.5 4.70, g < 0.01 

(9) (15) (15) (12) (3. 47) 

(7*JUST LIXE THE FIELD) 

HOW WELL RELEVANT 
QUESTIONS COVERED 

SCENARIO 6.0 6.8 4.7 3.8 13.10, £< f 

(9) (15) (15) (12) (3, 47) 

(7=C0VER COMPLETELY) 
DID CONTROLS OVERLAP 

SCENARIO? 6.9 --- 4.8 3.1 29.10, £< 

(9) (0) (15) (12) (2, 33) 

(7=N0 OVERLAP) 
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Table 17. Mean Ratings of the Non-Blind Evaluatora 

With No Slgnilicant Differences Between Agencies. 



Agency 1 Agency 2 Agency 3 Agency 4 

LENGTH OF PRETEST 4.2 3.8 3.9 4.3 

(9) (15) (15) (12) 

(4= ABOUT THE SAME) 
PRESENTATION OF CONTROL 

SAME? 5.9 0.0 4.8 5.3 



(9) (0) (15) (12) 



(7=JUST LIKE THE FIELD) 



EMPHASIS ON RELEVANT 

IN PRETEST 3.9 0.0 4.0 4. 



(9) (0) (15) (12) 



(4* ABOUT THE SAME) 



EMPHASIS ON CONTROLS 

IN PRETEST SAME? 4.0 0.0 4.0 3.8 

19) (0) (15) (12) 

(4= ABOUT THE SAME) 
ADMONITIONS ABOUT 

MOVEMENT? 4.0 0.0 3.9 3.8 

(9) (0) (15) (12) 

<4=SAME AS FIELD) 
EMPHASIS ON RELEVANTS 

BETWEEN CHARTS 3.7 0.0 3.7 0.0 

(9) (0) (11) (0) 

(4= ABOUT THE SAME AS FIELD) 

EMPHASIS ON CONTROLS 

BETWEEN CHARTS 4.1 0.0 4.0 0.0 

(9) (0) (11) (0) 

(4 "ABOUT THE SAME AS FIELD) 

ALL j> > 0.05, ns 

(N) 
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The Agency 2 evaluator rated the examiners' descriptions of 
the polygraph and related physiology as being more detailed than 
the field, and the Agency 2 rating on this item was significantly 
higher than the other agencies. F (3. 47) = 7.2, p_ < 9.991. 
Agency 4 rated the relevant questions significantly worse at 
covering the scenario than the other three agencies (see Table 
16). The differences between agencies on their ratings of 
emphasis of the relevants on the pretest were statistically 
significant, but do not differ greatly in magnitude. Agency 1 
rated the control questions' overlap as almost none compared to 
the other agencies, who rated the overlap as somewhere between 
all or none. 

Comments By. the Agency Evaluatorg . The dominant theme 
expressed by the Agency 4 evaluators was a concern that the 
scenario situation did not have enough significance for the 
subject. A salient example offered was that the subject was a 
loyal officer and that his participation in the study was a 
constructive effort for his country. On the other hand, the 
control questions may have been perceived to be a greater threat, 
as questions about this person's overall honesty and integrity 
were more threatening than those relating to the scenario. In 
particular, the 'security control' questions (i.e.. Have you ever 
discussed classified information over the telephone?) were 
thought to have been more relevant (as a result of real world 
experiences) than the programmed espionage 5 . Various cues from 
the examiners ('..this is Just a scenario,' the exam is for 
'security suitability,' etc.) were seen by the evaluators as an 
opportunity for rationalization on the part of the subject. 

The issue of availability for follow-up testing also seemed 
to be important to the Agency 4 evaluators. In one case it was 
the opinion of the evaluator that the subject was not fit to be 
tested at the time of the examination, although the original 
examiner tested him anyway. Other distractions also contributed 
to taking the focus off of the exam (i.e., the subject had other 
appointments after the polygraph examination). 

The Agency 3 reviews revealed flaws in certain tests. 
Against standard practice, one examiner gave instructions on how 
to control breathing, another emphasized the irrelevant 
questions, and a third Interrogated on the control questions. 
However, the agency evaluators did not feel that these violations 
of standard practice invalidated any of the examinations. 



5 As a result of tbc» cosaents. the hypothesis Ibat control questions of a "security nature' mm too strong 
for a mci espionage experiment «u tctttd in the agency 4 data, inproxiiatsly half of the Agency 4 
Quiity/Knoiledgeible subjects kn uktd ou or t«o 'security' control questions rtlle tbe other balf sere asktd 
no 'security' control questions . Statistical analysis failed to reveal any association of the "security" control 
questions «tts decisions at either the end of the first series ol questions or at the conclusion o« all testing. 
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The Agency 2 reviews tend to focus on inadequate procedures 
on the part of the examiners (brief pretests, incomplete 
elaboration of relevant questions, etc.). There were, however, 
equally positive comments throughout. It appears that motivation 
on the part of the subjects was the major concern of the 
evaluator s . 

Questionnaire Data 

The results of the analysis of the questionnaire data is 
presented in detail in Appendix C. There were two interesting 
findings. First, subjects in Experiment 1 described their 
emotional state during the examinations as being curious and 
hopeful rather than as being fearful, tense, or nervous. Second, 
pretest perceptions of how accurate polygraph tests were did not 
have any predictive validity for the outcome of the examination. 
That is, subjects who before the examination thought the 
polygraph did not work were Just as likely to be correctly 
classified as those who thought the polygraph was very accurate. 
This finding does not support those critics who state that a 
belief in the accuracy of the polygraph is necessary for the 
technique to work (Lykken. 1981). Please see Appendix C for 
details of these analyses. 
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DISCUSSION 

The results of Experiment 1 were surprising. In contrast to 
most of the scientific literature on the detection of deception, 
very few false positive errors and many more false negative 
errors were found. The bias for passing individuals was so 
strong that the overall performance on Qui 1 ty /Knowl edgeable 
subjects was "significantly poorer than chance,* and no 
individual agency performed at better than chance levels with 
Guilty/Knowledgeable subjects. The predictive validity of the 
screening examinations in this study was so poor that performance 
was at or near chance levels for two of the agencies. By way of 
comparison, the original examiners in a recent study of the 
forensic polygraph examinations given by the United States Secret 
Service [Honts et al. 1988; also reported as Raskin. Kircher, 
Honts. 8c Horowitz, 1988) accounted for more than six times the 
amount of variance, and the blind evaluator in a recent mock 
crime study (Kircher & Raskin, 1988) accounted for 8 times the 
variance in the Gui 1 t/ Innocence criterion than did the original 
examiners in the present study. The independent evaluators in 
Experiment 1 generally performed about the same as the original 
examiners, with the exception of the independent evaluators of 
Agency 2 who performed at less than half the efficiency of their 
original examiners ( tau cs of 0.23 and 0.50, respectively). 

The major unresolved question about Experiment 1 is whether 
the high false negative rate generalizes to security screening in 
the field, or whether it was an artifact of the experimental 
conditions. If the results do represent the field accuracy of 
security screening examinations, then there must be major 
differences, as yet undefined, between security screening and 
forensic polygraph examinations. In that case, it is necessary 
to determine what those differences are, and how their effects 
can be counteracted . 

However, it may be that the false negative rate obtained in 
Experiment 1 does not generalize to the field. In order to 
explore that possibility, we examined the execution of Experiment 
1, and the methodological differences between Experiment 1 and 
studies conducted in other laboratories. In that way any 
Important methodological problems with Experiment 1 should become 
evident . 

One Issue that might be raised concerns the extent to which 
the techniques used in Experiment 1 actually reflect those 
techniques used in security screening polygraphs given in the 
field. That issue has been examined and can be dismissed as a 
possible flaw in Experiment 1. The evaluators from the agencies 
did not find major differences between the way examinations were 
conducted in Experiment 1 and the way they were conducted in the 
field by the respective agencies. However, there were several 
differences between the methodology used in this study and that 
used in many of the other simulation studies of the detection of 
deception. 
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The first difference concerns the number of issues in the 
examination. The screening examination is a multiple issue 
examination, while most forensic examinations are single issue 
examinations. Few research studies have examined multiple issue 
testing and those that have, have produced results that suggest a 
decrease in predictive validity when multiple issue examinations 
are used, particularly if the subject is truthful to some issues 
but deceptive to others (Podlesny & McGhee , 1987; Raskin, 
Kircher, Honts, & Horowitz, 1988). Experiment 2 was designed to 
examine multiple issue testing. Subjects guilty of none, 1, 2, 
or 3 mock crimes were tested with either one multiple issue test, 
or with three single issue tests. The results of Experiment 2 
should provide some insight into the effects of testing multiple 
relevant issues within the same examination. 

A second difference between Experiment 1 and most other 
analog studies of the detection of deception concerns the delay 
between the enactment of the mock crime and the polygraph 
examination. Experiment 1 imposed a delay of about 2 months 
between the enactment of the espionage scenario and polygraph 
examination. Most other simulation studies have imposed no delay 
between the enactment of their mock crime and their examinations. 
A few recent studies (Honts, Kodes . & Raskin, 1985; Honts, 
Raskin, & Kircher, 1986; 1987; Podlesny & McGhee, 1987) had a one 
week delay between the mock crime and the polygraph testing. All 
of those studies have produced results comparable to other high 
quality Studies in the literature that did not include a time 
delay. We examined the effects of a lengthy time delay on 
security screening examinations in Experiment 3. In Experiment 3 
some subjects were tested immediately after committing an act of 
mock espionage, and other subjects were tested 6 weeks later. 

A third difference between Experiment 1 and most other 
simulation studies of the detection of deception concerns the 
specificity of the relevant questions. The relevant questions 
used in the security screening examinations of Experiment 1 were 
worded in very general terms about committing unspecified 
security violations. Typically, in forensic polygraph 
examinations very specific relevant questions are used that deal 
with a single well defined act. It may be that the use of 
nonspecific relevant questions makes it easier for deceptive 
Individuals to produce truthful outcomes. We examined the 
effects of using specific and non-specific relevant questions in 
Experiment 3. Some subjects received typical screening non- 
specific relevant questions. Other subjects received very 
specific questions about the scenario, questions much more like 
those typically used in forensic polygraph examinations. 
Experiment 3 was designed so that the specificity of the relevant 
questions was crossed with the time delay and gui 1 t/ innocence in 
a 2 X 2 X 2 factorial design. By crossing the three factors 
their possible interactions could also be examined. 
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An Important difference between Experiment 1, many other 
simulation studies of the detection of deception, and actual 
security screening examinations concerns motivation. In 
Experiment 1, the subjects were told that any real world 
admissions they made could be used against them, and guilty 
subjects were told that they should attempt to appear truthful 
and should not confess. However, there were neither benefits to 
the subjects if they passed their tests, nor penalties if they 
failed them. There is some evidence that the subjects in 
Experiment 1 were less aroused physiologically then were subjects 
in actual screening examinations. Heart rate data were 
calculated from the beginning of each subjects' charts in 
Experiment 1, and a mean heart rate was calculated for all 
subjects. M = 75.9. Heart rate data was also obtained from 412 
individuals who took actual aperiodic screening examinations at 
Agency 2, M = 83.7. The difference between the heart rates in 
Experiment 1 and the Agency 2 subjects was significant, F (1, 
616) = 34 . 88 . , p_ < 0. 001 . 

Some research has suggested that motivation is an Important 
variable in conducting simulation studies of the detection of 
deception. A recent met a analysis (Kircher, Horowitz, & Raskin, 
1988) found that about 53X of the variance between the accuracy 
rates of simulation detection of deception studies was accounted 
for by level of motivation, with higher motivation producing more 
accurate results. Any differences in performance between 
Experiment 1 and Experiments 2 and 3 might provide some insight 
on this issue since all three experiments used the same level of 
motivation. We will return to the issue of motivation in the 
General Discussion section of this report. 
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Experiment 2 
ISTB0DUCT1ON 

Screening tests and criminal investigative tests differ in 
the number of issues they cover. Criminal investigative teste 
are usually limited to one specific issue ("Did you steal that 
money'") or to a cluster of closely related issues ("Do you know 
who stole that money?" "Did you steal that money'" "Do you know 
where any of that stolen money is now'") . On the other hand, 
screening tests may cover several security issues, such as 
espionage, sabotage, or terrorism, and a variety of lifestyle 
issues . 

.Few studies have examined the accuracy of the polygraph in 
multiple issue testing situations and only two studies (Barland, 
1981; Correa & Adams, 1981) dealt explicitly with screening 
situations. In general, these studies found that polygraph 
examinations were more accurate at discriminating completely 
truthful subjects from subjects who were attempting deception to 
something, than at the more difficult task of discriminating to 
which question (s) a person was attempting deception. 

Of the studies that have been concerned with multiple 
issues, only the study reported by Barland (1981) used federal 
polygraph screening procedures. That study examined the validity 
of Counterintelligence Screening Tests (a directed lie control 
test) in an analog experiment that used 56 INSCOM volunteers as 
subjects. Those subjects filled out a statement of personal 
history. Later, the 30 subjects assigned to the guilty group 
filled out a second statement of personal history, and they were 
required to lie to one of five items on this second statement. 
They were also instructed to lie to the same item on their 
subsequent polygraph test. All subjects were then tested by 
INSCOM examiners. Excluding the 16% inconclusive outcomes, 76% 
of the programmed innocent subjects and 81% of the programmed 
guilty subjects were correctly classified by those examinations. 
Decisions about deception to single questions were less accurate. 
Excluding the 15% inconclusive outcomes, 91% of the decisions on 
the questions answered truthfully, but only 63% of the questions 
answered deceptively, were correctly classified. 

Unfortunately, there are several factors in the Barland study 
that may limit its general izabi 1 i ty . First, the guilty subjects 
never attempted deception to more than one relevant question. In 
the field it is likely that persons engaged in espionage would 
have to attempt deception to several relevant questions. 
Second, the deception was related to falsification of a statement 
of personal history, rather than toward the usual issues of 
aperiodic screening examinations. Finally, the testing technique 
used in the Barland study is used only by INSCOM. 

Experiment 2 examined accuracy when the guilty subjects may 
have been lying to any one, any two, or all three issues on a 
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three Issue test. Mock espionage and sabotage paradigms were 
used. The relevant Issues of the examinations were similar to 
those used in screening examinations. The testing technique was 
a control question test, the technique most commonly U3ed In 
criminal investigations. In addition, Experiment 2 compared the 
accuracy of a single multiple issue test to the accuracy of three 
single-issue tests. 

Method 

Subjects 

The Subjects were 100 basic trainees at Ft. McClellan, 
Alabama who volunteered for the study. No pay or inducements 
were given to the trainees for volunteering, nor were they 
offered any reward for passing their polygraph examinations. 
They ranged in age from 18 to 32 with a mean of 20.2 years. 
Ninety-four of the subjects were males and 6 were females. 

Apparatus 

Lafayette all-electronic field polygraph instruments were 
used. Those instruments recorded respiration by means of an 
elastic, air-filled tube placed around the subject's chest. 
Relative blood pressure was measured by means of an arm cuff 
inflated to about 70 mm Hg placed on the subject's upper right 
arm. Vasomotor activity was measured by means of a photoelectric 
plethysmograph placed on the subject's left thumb. Skin 
resistance was measured by stainless steel plate electrodes 
attached to the palmar surface of the subject's left index and 
ring fingers. Skin conductance was measured by stainless steel 
plate electrodes attached to the palmar surface of the subject's 
left middle and little fingers. No electrolyte medium was used 
for either skin resistance or conductance measurement. The 
examinations were administered in the same exam rooms described 
in Experiment 1. All of the examinations were videotaped using 
procedures similar to those described for Experiment 1. 

Procedure 

Subjects were randomly assigned to one of four conditions of 
equal size. One condition was an innocent condition and the 
other three were guilty conditions. Subjects assigned to the 
first guilty condition enacted one of three possible acts of 
espionage or sabotage. Subject assigned to the second guilty 
condition enacted two of the three possible acts, and the 
remaining guilty subjects enacted all three mock crimes. 

Subjects were brought to the Polygraph Institute from their 
training area in groups of six to ten. They were briefed, as a 
group, on the purpose of the experiment. They were told that 
their participation was voluntary, and they were asked to sign 
the statement of informed consent. No subject refused to 
participate. After signing the consent form, the subjects were 
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escorted to the examination room and instructed to wait until 
someone came for them. The polygraph examiners were kept In 
another part of the building to prevent their observing what was 
happening to the subjects. 

Guilty subjects were escorted one at a time to participate in 
the predetermined crime(s). Dispatch of the escorts was 
coordinated with walkie-talkies to prevent any subject from 
observing any other subject. Innocent subjects were also taken 
from the exam rooms for variable lengths of time to make their 
experience as equivalent, as possible to that of the guilty 
sub j ects . 

The scenarios for the three crimes were as follows. Crime 1 
was the theft of a of a classified document. Subjects assigned 
to commit Crime 1 were escorted from the polygraph building to 
another building half a block away. While the escort talked with 
an office worker, the subject entered a walk-in vault, located a 
mock-classified document, and copied it on a nearby photocopier. 
The subject returned the document to its place in the vault and 
hid the photocopy on his or her person, where it remained 
throughout the polygraph exam. 

Crime 2 consisted of photographing classified equipment. 
Subjects were individually escorted to another nearby building, 
where polaroid photographs were being made of some mock 
classified equipment. While the photographer took the escort 
into another room, ostensibly to ask some questions, the subject 
unobtrusively entered the room with the equipment and took a 
picture of it with the polaroid camera. The subject hid the 
picture on his or her person, where it remained throughout the 
polygraph test. Just after the subject and escort left the 
building, the photographer came running out to say that the 
camera had been moved and to ask if either of them had touched 
it. Both denied having done so. 

Crime 3 was an act of sabotage. The subject was detailed to 
police a nearby parking lot for scraps of waste paper. The trunk 
of one of the cars in the lot was open, as if it were being 
unloaded. A box of mock classified radio tubes was visible in 
the trunk. A hammer was nearby. The subject smashed one of the 
tubes with the hammer and discarded the remnants in a trash can 
with the waste paper. The subject was surreptitiously observed 
to ensure that the crime was properly committed. 

The polygraph examinations were conducted by 13 instructors 
from the Defense Polygraph Institute. All were polygraph 
examiners trained at DPI or its predecessor, all were certified 
by their parent organizations, and all were experienced in field 
polygraph work. The examiners were selected on the basis of 
their familiarity with the general type of tests being given and 
their availability. The examiners were blind to the guilt or 
innocence of individual subjects, but they were briefed on the 
details of the three mock crimes so that they could conduct the 
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tests realistically. 

Two different types of polygraph examinations were 
administered. Half of the subjects were tested with three single 
Issue examinations, and half were given one multiple issue 
examination. The two types of examinations differed in the 
nature of the pretest and in the number of issues covered on each 
polygraph chart. 

Subjects who were given single issue tests were treated as 
if they were criminal suspects. That is, the examiner informed 
them that three crimes had been committed, and that there was 
reason to believe that the subject may have committed one or more 
of them. The subjects were given a Miranda warning, and their 
control questions were tailored to the individual subjects' 
personalities and the type of crime being covered. A stimulation 
(number) test was administered. Then, three single issue control 
question tests were conducted, one after the other. Each test 
covered one crime and consisted of three charts. The sequence in 
which the crimes were covered was systematically varied to 
control for possible sequence effects. The following question 
patterns were used in the three single issue tests. The 
abbreviations for question types are as follows: IR - 
Irrelevant, SR - Sacrifice Relevant, CQ - Control Question. RQ - 
Relevant Question. 

Crime 1 (Theft of Document): 

IR 1. Is today ? 

SR 2. Do you intend to answer truthfully each question 
on this test? 

CQ 3. Prior to coming on active duty, did you ever 
steal anything from a member of your family? 

RQ 4. Did you 3teal that classified document? 

CQ 5. Prior to 1988, other than what you told me 
about, did you ever steal anything? 

RQ 6. Do you know where that classified document is 
now? 

CQ 7. While living in , did you ever steal 

anything from someone who trusted you'' 

Crime 2 (Photography) : 

IR 1. Is today ? 

SR 2 , Do you intend to answer truthfully each question 
on this test? 
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CQ 3. Prior to coming on active duty, did you ever 
violate the tru3t of a family member? 

RQ 4. Did you photograph that classified equipment? 

CQ 5. Prior to 1988, other than what you told me, did 
you ever violate anyone's trust' 

RQ 6. Do you know where any photographs of that 
classified equipment are now? 

CQ 7. While living in , other than what you told 

me about, did you violate the trust of a 
friend? 

Crime 3 (Sabotage) : 

IR 1. Is today ? 



SR 2. Do you intend to answer truthfully each 
question on this test? 

CQ 3. Prior to coming on active duty, did you ever 
damage anyone's personal property? 

RQ 4. Did you smash that piece of classified 
equipment? 

CQ 5. While living in , did you damage anything? 

RQ 6 . Do you know what was used to smash that piece of 
classified equipment? 

CQ 7. Prior to 1988, did you ever damage any public 
property? 

Thus, there were two relevant questions and three control 
questions in each of the three single issue tests. The multiple 
issue test administered to the remaining subjects used the same 
six relevant questions, but used only four control questions. The 
question sequence for the multiple issue test was: 

IR 1. Is today ? 

SR 2 . Do you intend to answer truthfully each question 
on this test? 

CQ 3. Before Joining the Army, did you ever steal 
anything from a store? 

RQ 4. Did you steal that classified document' 

RQ 5. Do you know where that classified document is 
now? 
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CQ 6. Prior to 1988, did you ever steal anything'' 

RQ 7. Did you smash that piece of classified 
equipment? 

RQ 8. Do you know what was used to smash that piece of 
classified equipment? 

CQ 9. While in high school, did you ever damage 
anything? 

RQ 10. Did you photograph that classified equipment? 



RQ 11. Do you know where any photographs of that 
classified equipment are now? 



CQ 12. Between your 13th and 18th birthday, did you 
ever violate the trust of another'? 



Regardless of the test outcome, no interrogation or 
additional testing was conducted. The charts were numerically 
scored by the examiner Immediately following the test. The 
examiner scored respiration, skin resistance, relative blood 
pressure and vasomotor activity on a 7-point scale that ranged 
from +3 to -3. Scores were determined by comparing each 
physiological system at each relevant question against the 
greater of the two nearest control questions (one preceding, the 
other following the relevant question). The criteria for 
reactions were those taught at the Defense Polygraph Institute. 
Negative scores were assigned when the reaction to the relevant 
question was larger and positive scores were assigned when the 
reaction to the control question was larger. The magnitude of 
the score was dependent on the magnitude of the difference 
between the relevant and control question. The scores for each 
relevant question were summed across the four channels and the 
three charts. Scores of -3 or lower to any relevant question on 
a test resulted in a deceptive (DI) outcome. If the test was 
not deceptive, but any relevant question had a score between +2 
to -2 inclusive, the outcome was inconclusive. Only if the 
scores on all relevant questions were +3 or higher was the test 
categorized as truthful (NDI). 
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Reaul ts 

Original Examiners' Classifications 

Table 18 displays the overall performance of the original 
examiners at the gross classification of individuals as either 
completely innocent or guilty to at least one crime. Decisions 



Table 18. Decisions of the original examiners in Experiment 2. 

Decision 

Approach 

Condition NDI INC DI TOTAL 



Multiple Issue Approach 



Innocent 


6 


3 


2 


11 


Guilty 


2 


11 


26 


39 


Single Issue Approach 










Innocent 


5 


6 


1 


12 


Qui lty 


3 


4 


31 


38 


TOTALS 


16 


24 


60 


100 



with the Multiple Issue approach on subjects who committed no 
crimes were 55% correct, 18% incorrect, and 27% inconclusive. 
Excluding inconclusives . 75% of these innocent subjects were 
categorized correctly. With the Multiple Issue approach subjects 
who committed one or more crimes were called deceptive to at 
least one of the crimes 67% of the time, deceptive to none of 
the crimes 5% of the time, and 28% were reported as 
inconclusive. Excluding inconclusives, 93% of the Guilty 
subjects were classified as deceptive to at least one of the 
crimes. The X 2 for the multiple issue portion of Table 18 was 
significant. X 2 (21 = 16.69, p_ < as was the tau C = 0.42, p_ 

< 0.001. 

Outcomes with the Single Issue approach on Innocent subjects 
were 42% correct, 8% incorrect, and 50% inconclusive. Excluding 
inconclusives, 83% of these innocent subjects were categorized 
correctly. With the subjects who committed one or more crimes 
the Single Issue approach called 82% deceptive to at least one 
crime. 8% deceptive to no crimes, and 10% were called 
inconclusive. Excluding inconclusives, 91% of the Guilty 
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subjects were classified as deceptive to at least one crime. The 
X 2 for the single issue portion of Table 18 was significant, X 2 
(2) = 21.25, p_ < 0.01, as was the tau C = 0.54, £ < 0.001. 

Two Kruskal -Wal 1 is oneway ANQVAs were conducted on these 
data. The first Kruskal -Wal 1 is tested for effects of the 
Gui 1 t/ Innocence factor on decisions, and that effect was found to 
be significant, X 2 (1) = 31.52, £ < 0.01. The second analysis 
tested for an effect of the Approach (Single, Multiple) , and that 
analysis was not significant. Possible interactions of Quilt and 
Approach were tested with a parametric Guilt X Approach ANOVA. 
That analysis found a significant main effect for Guilt, F (1, 
96) = 30.4, p_ < 0.001. but none of the other effects were 
significant . 

Performance was also examined at the level of accuracy of 
classifications for single crimes. Since there were no 
significant differences in classifications for the Approach taken 
to testing multiple issues, this analyses was collapsed 6 across 
the Approach factor. Table 19 illustrates the accuracy of 
classification for each of the crimes with subjects who committed 
at least one crime. X 2 analyses were conducted on the frequency 
tables for the three crimes and none were significant. Overall, 
only 33% of the outcomes on specific individual crimes were 
correct. The predictive relationship for crimes 1 and 1 produced 
significant tau C values but they were in opposite directions, 
tau C * 0.28 and -0.22 respectively. These results indicate that 
the examinations were not able to determine which crime (s) had 
been committed. 

Numerical Scores 

Possible differences between the numerical scores of the two 
multiple issue approaches were tested In several ways. First, a 
total numerical score was calculated for each subject, and the 
variance in those scores was decomposed with a Guilt (Innocent, 
Guilty) X Approach (Single, Multiple) ANOVA. That analysis 
Indicated that Innocent subjects (M = 25. 52) produced larger 
total numerical scores than did Guilty subjects (M = 1.76) as 
shown by a significant main effect for Guilt, K (1. 96) = 30.4. p_ 
< 0.001. There were no significant effects or interactions 
involving the Approach factor. The positive mean numerical score 
for Guilty subjects is not unexpected. Subjects guilty of 



6 The tern 'collapsed* is used to indicated that two (or more) of the original 
conditions of the experiment were combined for additional analysis. 
Collapsing across a condition is Justified after a demonstration that the 
grouping factor being collapsed across had no statistically significant 
effects. In this case, since there were no significant effects for the 
Approach taken to multiple issue testing on the classifications obtained, it 
is Justifiable to remove the Approach as a grouping factor from any additional 
analyses on classifications. 
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Table 19. Percent accuracy for detecting which crime was 

committed by subjects who committed at least one 
crime . 



NDI INC DI 



Crime 1 (Espionage) 



Truthful on 48 32 20 

Crime (N = 25) 

Deceptive on 23 35 42 

Crime (N = 52) 

Crime 2 (Photography) 

Truthful on 12 42 46 

Crime (N = 26) 

Deceptive on 29 41 30 

Crime (N = 51) 

Crime 3 (Sabotage) 

Truthful on 19 39 42 

Crime (N = 26) 

Deceptive on 33 30 37 

Crime (N * 51) 

Combined 

Truthful on 26 38 36 

Crime <N = 77) 

Deceptive on 29 35 36 

Crime (N = 154) 



only one or two crimes would be expected to produce negative 
numerical scores to some questions and positive numerical scores 
to others. Those expected questions scores would combine to make 
the total numerical scores less extreme. 

Possible differences between the crimes and between subjects 
based on the number of crimes they committed were tested by 
developing total numerical scores for the two relevant questions 
directed at each of the three crimes. Those crime total scores 
were then analyzed with a repeated measures analysis of variance 
(RANOVA) containing one repeated measures factor. Crime Total 
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Score (3 levels), and two between subject factors, Approach 
(Single, Multiple) and Number of crimes committed (0. 1. 2, and 
3) . There were no significant effects revealed by that 
ar.alys 1 s . 

Discussion 

The most important result in Experiment 2 was the finding of 
no differences in the Approach taken to testing multiple relevant 
Issues. There were no differences between the use of one 
multiple issue control question test or three single issue 
control question tests in examiners' decisions, or in the more 
powerful tests of the numerical scores. Those results suggest 
that the multiple relevant issue testing approach was not a 
likely contributor to the poor detection of deception in 
Experiment 1. However, Experiment 2 did not examine the effect 
of the number of relevant questions addressed to relevant issues. 
In some of the question series in Experiment 1 only a single 
relevant question covered the acts of the scenario. The effects 
of the number of relevant questions devoted to the acts of 
deception remain to be determined. 

The accuracy levels achieved in Experiment 2 are better than 
Experiment 1. The tau C for the two approaches averages 0.48- 
Since neither study offered any reward or punishment for passing 
or falling the examinations, this finding suggests that the lack 
of reward or punishment associated with examination outcomes was 
not the critical factor in the poor detection of deception in 
Experiment l. However, the tau C obtained in Experiment 2 
indicates that the examinations in this study still were not very 
good discriminators of guilt and innocence. The decisions In 
Experiment 2 only account for about a third as much variance in 
the Guilt/Innocence criterion as did the decisions of the Secret 
Service examiners in Honts et al (1988), and only about a fourth 
as much as a recent mock crime study (Kircher & Baskin, 1968). 
Further, the mean numerical score for Guilty subjects was 
positive, rather than negative as predicted by the rationale of 
the control question test. These results leave open the 
possibility that the lack of explicit reward or punishment 
associated with examination outcomes in these experiments may 
still be a contributor to poor detection, and they are consistent 
with the analysis of Kircher et al . (1986), which indicates that 
the motivational structure is an Important variable in detection 
of deception experiments. 

One interesting finding of Experiment 2 was that the 
examinations did not detect deception at the level of the 
individual crimes. This result has important implications for 
examiners who must test on multiple relevant issues, as it 
suggests that the numerical scores associated with individual 
relevant issues may be a poor guide in choosing issues for 
interrogation. This result suggests that when deception is 
inferred, the interrogator may need to address all of the 
relevant issues of the examination with the interrogation. 
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Experiment 3 

Introduction 

Experiment 1 differed methodologically from other detection 
of deception experiments in a number of ways. Almost all 
previous research on lie detection used relevant questions 
tailored specifically to the mack crime under investigation. The 
examinations in Experiment 1 usually used relevant questions that 
were worded very generally about rather broad categories of 
activity. The generality of screening questions could contribute 
to false negative errors by reducing the emotional impact or 
diffusing the salience cf the relevant questions. Additional 
problems could arise with general relevant questions if the 
examiner did not completely define what is included in and 
excluded from each relevant question. If the relevant questions 
are somewhat ambiguous, guilty subjects might think that the 
relevant questions do not pertain to them and they might not 
res pond . 

Another methodological factor that differentiated Experiment 
1 from previous research was the time lag between enacting the 
mock espionage and the running of the polygraph tests. In 
Experiment 1 , two months elapsed between the scenarios and the 
polygraph tests. Some recent research (e.g.. Honts . 1986; Honts. 
Hodes. & Raskin, 1985; Honts, Raskin. & Kircher, 19B7; Podlesny & 
McQhe 1 ? , 1987) has introduced intervals of several days or a week, 
but none has approached the two month time lag of Experiment 1. 
That amount of time could conceivably have blunted the programmed 
guilty subjects' emotional reaction to their scenarios. 

Experiment 3 investigated the effect of having a long 
interval between the enactment of the mock crime and the 
polygraph examination, and the effect of general versus specific 
relevant questions. 

Method 

Sub J ectfi 

Volunteers were initially solicited from among the 207 
subjects who had served in Experiment 1. Some had been 
reassigned from Ft. McClellan, but 83 Experiment 1 subjects 
volunteered to serve in Experiment 3. An additional 17 subjects 
similar to those used in Experiment 2 were recruited from the 
basic trainees at Ft. McClellan. None of the basic trainees had 
ever served as research subjects. None of the 100 subjects was 
paid to volunteer, and no explicit reward or punishment was 
associated with test outcomes. 

Examiners 

Fifteen instructors at the Defense Polygraph Institute served 
as examiners. All were trained at the Defense Polygraph 
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Institute or its predecessor, all were federally certified 
examiners, and all were experienced in field polygraph work. 
Thirteen had served as examiners in Experiment 2. As in 
Experiment 2, examiners were selected on the basis of familiarity 
with the general type of test (screening or criminal 
investigation) and their availability (lack of conflict with 
other assigned duties) . The examiners were blind to each 
subject's guilt or innocence, the base rate of deception, and the 
nature of the espionage scenario. 



Apparatus 



Lafayette field polygraphs were used to record respiration, 
cardiovascular activity, vasomotor activity, and the skin 
resistance response. The equipment was similar to that described 
for Experiment 2, except that a second respiratory channel was 
recorded instead of skin conductance. The examinations were 
conducted in May, 1986 in the same rooms used in the two earlier 
experiments . 



Procedure 



Experiment 2 was an unbalanced 2X2X2 factorial design 
that consisted of three between subject factors: the amount of 
time between enacting the crime and taking the polygraph 
examination (about 30 minutes versus six weeks), the specificity 
of the relevant questions (general versus specific), and guilt 
(guilty versus innocent). The 63 subjects who had participated 
in Experiment 1 were randomly assigned to the cells in the design 
matrix. However, the 17 basic trainees were available on the 
examination day only. Consequently they could not be assigned to 
the long latency condition. They were randomly assigned only to 
the four cells (guil ty/ innocent , general/specific) in the short 
latency condition. The design and number of subjects in each 
cell was as follows: 

Qui It 



Type General 
Question 

Specific 



Innocent 



Guilty 



Latency 





Same 






Same 


I 


6 Weeks! 


Day 




6 Weeks : 


Day 


I 
1 


5 


5 




20 : 


20 


: 58 


5 : 


5 




20 : 


20 


: 50 


10 : 


10 




40 


40 


100 



Subjects arrived at the Polygraph research annex either 
about six weeks prior to their polygraph test or on the day of 
the test. Each subject read a description of the study, signed a 
statement of consent, and then read instructions for their 
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assigned condition (guilty or Innocent). All subjects were given 
a sealed envelope and Instructed to take It to the Institute's 
main building, across the road from the annex. They went to the 
office of the Director's secretary to deliver the envelope. The 
purpose of the delivery was to give both the innocent and guilty 
subjects access to a mock classified document on the secretary's 
desk. If a polygraph examiner inadvertently saw the subject in 
the secretary's office, he would not know if the subject was 
guilty or not. All subjects used the envelope as a pretext to 
get the secretary out of her office, during which time the guilty 
subjects had to locate and steal a mock classified document from 
the secretary's desk. 

After smuggling the stolen document out of the building, the 
guilty subjects read the document, then hid it in a tin can in a 
nearby assembly area for retrieval by another spy. Half of the 
subjects were tested six weeks following this Initial activity 
and half were tested immediately. 

Two types of polygraph examinations were administered. Half 
of the subjects were treated as If they were criminal suspects 
and they were given an examination with very specific relevant 
questions. They were informed that a classified document had 
been stolen from a room they had had access to. They were given 
a Miranda warning and they were given a pretest interview similar 
to that used in criminal investigative examinations. The 
relevant questions of the specific relevant question examination 
were oriented to the theft of a specific classified document. 
The control questions were tailored to both the crime and the 
subject's personality. Typical test questions for the specific 
relevant condition were as follows. 

IR 1. Is today ? 

SB 2. Do you intend to answer truthfully each question 
on this test? 

CQ 3. While on active duty, did you ever steal any 
government property? 

RQ 4. Were you instructed to steal that secret 
document from the secretary's office? 

RQ 5. Did you steal that secret document from the 
secretary's desk? 

CQ 6. Other than what you told me, before 1988, did 
you ever steal anything? 

RQ 7. Did you at any time read that secret message? 

RQ 8. Did you hide that secret document for someone 
else to pick up? 
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CO 9. Prior to coming to Ft. McClellan, did you ever 
steal anyone's personal property? 

The other half of the subjects were administered a security 
screening type of examination with general relevant questions. 
They were not given a Miranda warning, and their pretest 
interview was similar to a counterintelligence screening 
examination. However, the relevant questions were not the normal 
counterintelligence questions. Only one or two of the questions 
normally included on counterintelligence screening tests would 
apply to the theft of a document from a secretary's desk. 
Consequently, if the criminal type test was found to be more 
accurate than the screening type, it could be due either to the 
specificity of the relevant questions or to the number of 
relevant questions dealing with the theft. To avoid that 
problem, both the question sequence and the number of of relevant 
questions that the guilty would have to lie to were held the 
same. The only differences were the nature of the pretest 
interview, the specificity of the relevant questions, and the 
latitude of the examiner in selecting control questions. The 
questions used on the general relevant question test were: 

IS 1 . Is today ? 



SR 2. Do you intend to answer truthfully each question 
on this test? 

CQ 3. Have you ever deliberately done anything 
d ishonest? 

HQ 4. Have you ever planned to take classified 
documents without authorization? 

RQ 5. Have you ever committed an act of espionage 
against the US? 

CQ 6. Are you a really honest person? 

RQ 7. Have you ever participated in providing 

classified information to an unauthorized 
person? 

HQ 8. Have you ever removed classified defense 

material from a building without authorization' 

CQ 9. Have you ever lied to make yourself look 
important? 

Following the examination the subjects were given a debriefing 
questionnaire similar to the one used in Experiment 1 (see 
Appendix A) . 
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Results 

Original Examiners' Classifications 

The overall performance of the original examiners is shown 
in Table 20. With Innocent subjects, the examiners' 
classifications were 90% correct, 10% incorrect, and none were 
inconclusive. With guilty subjects, the examiners' 

classifications were 75% correct, 7% inconclusive, and 18% 

incorrect. Excluding inconcl usi ves , 81% of the Guilty subjects 

were classified correctly, and 19% were false negative errors. 



Table 20. Decisions of the original examiners in Experiment 2. 

Decision 

Condition HDI INC DI Total 

Innocent 18 2 20 

Guilty 14 6 60 80 



The predictive relationship illustrated in Table 20 was 
tested in the several ways described for evaluating decisions in 
Experiment 1. X 2 analysis was conducted, and the X 2 for Table 
1 was significant, X 2 (2) » 38.68, p_ < 0-01- The tau C for the 
relationship illustrated In Table 1 was also significant, tau C 
= 0.46, £ < 0.01. 

A series of Kruskal-Wal 1 is oneway ANOVAs was used to test 
for the effects of Quilt, Time Lag, and Test Type on the 
decisions. Only Guilt produced a significant result, X^ (1) = 
44.3, p_ < 0.001. Examiner decisions were not affected by the 
time lag or the specificity of the test. A Time Lag X Test Type 
X Guilt parametric ANOVA was also conducted on the decision data 
to test for the possibility of interactions between the factors, 
and again only the main effect for Guilt was significant. F (1, 
93) = 87.2, p_ < 0.001. 

numerical Scores 

Relevant Question Effects. The numerical scores were 
collapsed across the five physiological channels and were 
analyzed with a RANOVA. That analysis Included three between 
subjects factors, Guilt (Innocent, Guilty). Time-Lag (Immediate, 
6 weeks), and Test Type (Specific, General), and two repeated 
measures factors, Chart (3 Levels) and Relevant Question (4 
Levels). The RANOVA found only two significant effects. The 
mean total numerical score for Innocent subjects (M = 35.4) was 
significantly larger than the mean total numerical score for 
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Guilty subjects <M = -2.3). as was indicated by a significant 
main effect for Quilt, F ( 1.92) = 69.32, p_ < 0.001. The other 
significant, but small, effect was an obscure 4-way interaction 
between Quilt, Time-Lag, Test Type, and Chart, F (2, 104) = 3.37. 
£ < 0.05. 

Physiological Channel Effects. The numerical scores were 
collapsed across the four relevant questions and were then 
analyzed with RANOVA. That RANOVA contained two repeated 
measures factors, physiological Channel (5; thoracic respiration, 
abdominal respiration, skin resistance, relative blood pressure, 
and finger pulse amplitude) and Chart (3) . and three between 
subject factors, Quilt (Innocent, Guilty), Time-Lag (Immediate, 6 
weeks), and Test Type (Specific, General). As expected, this 
RANOVA revealed the same main effect for Guilt and the 
interaction Guilt, Time-Lag, Test Type and Chart as was described 
above. This analysis also indicated a significant main effect 
for Channel, F (4, 368) = 13.01, and a significant interaction of 
Guilt and Channel, F (4, 368) = 7.72. The means representing 
these effects are shown in Table 21. The main effect for Channel 



Table 21 



Guilt 

Innocent 
(n = 20) 

Guilty 
(n = 80) 



Mean numerical scores of the various physiological 
channels by guilt condition. 
TR AR SRR RBP 

3.8 4.6 13.4 6.2 



-1 



-0.9 



0.3 



FPA Combined 
7.5 35.4 



■0.2 



-0.6 



-2.3 



TR = Thoracic Respiration 
AR = Abdominal Respiration 
SRR = Skin Resistance Response 
RBP = Relative Blood Pressure 
FPA = Finger Pulse Amplitude 



appears to be primarily due to skin resistance, which produced 
more positive means than the other channels. The interaction of 
Channel and Guilt appears to be due to the various channels being 
more or less effective with the Innocent subjects, while they 
were of approximately equal effectiveness with the Guilty 
subjects. None of the other main effects were significant. 
However, one other interaction was significant. The 3-way 
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interaction of Quilt, Channel, and Test Type was significant, F 
(4. 368) = 3. 87, £ < 0.05. but is difficult to interpret. 

The magnitude of the predictive validity of the 
physiological channels was also assessed. Scores for each of 
the physiological channels and their total sum were collapsed 
across Time Lag, Chart, Relevant Questions, and Test Type and 
were then correlated with the gui 1 ty/ innocent criterion and with 
each other. The resulting correlation matrix is presented as 
Table 22. All correlations were significantly different from 
zero . 



Table 22. Correlation matrix for the various physiological 
measures and the guilt criterion. 





TB AR 


SRR 


HBP 


FPA 


Total 
Score 


Qui It 


-0.42 -0.46 


-0.54 


-0.52 


-8.49 


-0.65 


TR 


0.79 


«. 31 


0.42 


0.39 


0.68 


AS 




0.34 


0.49 


0.49 


0.73 


SRR 






0.49 


0.54 


0.81 


RBP 








0.54 


0.77 


FPA 










0.78 



All correlations are significantly different than chance. 

TR = Thoracic Respiration 

AS z Abdominal Respiration 

SRR = Skin Resistance Response 

RBP = Relative Blood Pressure 

FPA = Finger Pulse Amplitude 



Other than the total score, the skin resistance response produced 
the largest correlation with the criterion, indicating that it 
was the most discriminating channel , and thoracic respiration 
produced the smallest correlation with the criterion indicating 
that it was the least useful predictor. 
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Population Differences 

Both basic training personnel and other civilian and military 
subjects were used in this study. Analyses were conducted to 
test for the possibility of differences between the trainees and 
the other subjects. A physiological Channel X Guilt X Time Lag X 
Test Type X Subject Type (Trainee, Other) RANOVA of the total 
numerical scores revealed no differences between the two 
populations, nor any interactions of Subject Type with any of the 
other factors. 

Affective Responses 

During their debriefings, subjects in both Experiments 1 and 
3 gave ratings on a 10 point scale of B affective descriptors of 
their subjective responses during their polygraph examinations. 
Their mean responses and the associated standard deviations are 
presented in Table 23. The affective ratings were subjected to a 
RANOVA with Quilt and Study (Experiment 1 . Experiment 3) as 
between subject factors, and one repeated measures factor. 
Descriptor (8 levels). The main effect of Descriptor was 
significant, F (7, 2074) = 51.65, p_ < 0.001, indicating that 
different ratings were given to different descriptors. The 
interactions of Descriptor and Guilt, F (7, 2074) = 8.39. £ < 
0.001, Descriptor and Study, F (7, 2074) = 7.63, p_ < ^.991, and 
the 3-way interaction of Descriptor, Guilt, and Study, F (7, 
2074) * 5.31, p_ < 0-01. were all significant, but are not easily 
interpretable . Of more interest were significant main effects 
for Guilt, F (1, 296) ■ 37.28, p_ < 0.001, and Study, F (1, 296) = 
7.85, p_ < 0.01, and a significant interaction of Guilt and Study, 
F (1, 296) = 6.13. p_ < 0.01. The sources of these between 
subjects effects were examined. 

In order to determine which descriptors were actually 
different across the Guilt conditions, a series of univariate 
ANOVAS were conducted. The Descriptors that produced significant 
univariate main effects for Guilt were Nervous, F (1. 298) ■ 
5.74, £ < 0.05, Tense, F (1, 298) = 10.07, p_ < 0.01, Guilt, F (1, 
298) - 98.0. p_ < 0.001. "and Anxious. F (1. 298) * 25.6. p_ < 
0.001. The means for Guilty and Innocent subjects on these 
descriptors are shown in Table 24. 
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Table 23. Means and standard deviations for the affective 
responses given by subjects in Experiments 1 and 3. 

Experiment 1 Experiment 3 



VARIABLE 


MEAN 


S 


. D . 


N 


MEAN 


S.D. 


N 


FEAR 


3.5 


2 


.5 


206 


3.5 


2.8 


96 


NERVOUS 


4.8 


2 


. B 


206 


5.1 


3.0 


96 


BORED 


3.5 


2 


.6 


206 


2.9 


2.6 


96 


TENSE 


4.5 


2 


.6 


206 


5.1 


2.9 


96 


CURIOUS* 


8.3 


2 


.2 


206 


7.1 


2.9 


96 


GUILT* 


3.2 


3 


.0 


206 


5.2 


3.7 


96 


ANXIOUS 


4.7 


2 


.9 


206 


5.1 


3.0 


96 


HOPE (OF NOT* 6.8 
BEING CAUGHT 


3. 


.5 


206 


5.4 


3.9 


96 


'Significant difference 


between 


Experiment 1 


and Experiment 3. 




Table 24. 


Means and 


standard 


deviations 


for the affective 




descr lptors 


that differed significantly 


across 


the 




Guilt condition. 
















Innocent 






Guilty 




VARIABLE 


MEAN 


S. 


D. 


N 


MEAN 


S.D. 


N 


NERVOUS 


4.4 


2. 


9 


132 


5.2 


2.7 


168 


TENSE 


4.1 


2. 


7 


132 


5.0 


2.6 


168 


QUILT 


1.9 


1. 


9 


132 


5.3 


3.5 


168 


ANXIOUS 


3.9 


2. 


6 


132 


5.6 


2.9 


166 



Since there likely was a great deal of cognitive overlap 
between the descriptors presented to the subjects, discriminant 
analysis was used to determine which descriptor (a) actually 
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discriminated between the Innocent and Guilty conditions. Four 
variables loaded into the significant Discriminant solution. 
Tense, Quilt, and Anxious loaded aa predictors with standardl2ed 
discriminant function coefficients of 0.21, 0.94, and 0.39 
respectively. Fear loaded into the solution as a suppressor 
variable with a standardized discriminant function coefficient of 
-0.49. The coefficients indicate that most of the discrimination 
between Innocent and Guilty subjects was carried by Quilt. The 
fact that Nervous dropped out of the discriminant analysis 
indicated that Nervous was completely redundant with the retained 
variables . 

The main effect of Study (Experiment 1, Experiment 3) was 
decomposed in a similar manner. The Descriptors were subjected 
to a series of univariate analyses, and three were found to be 
significant: Curious, F (1, 298) = 14.08. £ < O.0»l, Guilt. F (1, 
298) = 28.11, p_ < 0.001. and Hope, F (1, 298) = 7.56, £ < 
The means for these variables are shown in Table 23. A 
discriminant analysis was conducted on the descriptor ratings 
with Study as the criterion. Curious, Guilt, and Hope 
contributed significantly to the discrimination, with 
standardized discriminant function coefficients of. 0.48, -0.84. 
and 0.38, respectively. Again the Guilt variable accounted for 
most of the discrimination. 

To decompose the interaction of Guilt and Study, univariate 
Guilt X Study ANOVAs were conducted on each of the Descriptors. 
Only the analyses of Guilt and Hope produced significant Guilt X 
Study interactions, F (1, 296) = 5,94, £ < 0.05, and F (1, 296) = 
23.11, < 0.01, respectively. The means for these two 
interactions are shown in Table 25. Innocent subjects in 
Experiment 3 reported feeling less guilt than Innocent subjects 
in Experiment 1, while Guilty subjects in Experiment 3 reported 
feeling more guilt than Guilty Subjects in Experiment 1. 
Innocent subjects in Experiment 3 gave smaller ratings on the 
Hope descriptor than did Innocent subjects in Experiment 1. 
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Table 25. Mean responses to the affective descriptors of Guilt and 
Hope, across Quilt and Study. 



Experiment 1 



Experiment 3 



Affective Descriptor 

Condition 

Quilt 

Innocent 
Quilty 



2.82 
(116) 

4.61 
(92) 



J . 38 
(16) 

6. 12 
(76) 



Hopi 



Innocent 



Quilty 



e.se 

(116) 

6.59 
(92) 



1.31 

(16) 

6.32 
(76) 
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Discussion 

There are two major findings from Experiment 3. The first of 
those findings is that the time lag between the mock espionage 
and the polygraph examination had no effect on either the 
examiners' decisions or on the numerical scores. This result 
indicates that the time lag used in Experiment 1 probably did not 
contribute to the poor detection of deception. Interestingly, It 
also suggests that the inclusion of a time lag in analog studies 
of the detection of deception is probably not necessary for 
generalization. This Is an important methodological finding that 
is supported by research at the University of Utah (Honts et al. 
1985; 1986; 1987; Horowitz, Raskin, Honts, & Kircher , 1988). 

The second important finding of Experiment 3 is that the 
specificity of the relevant questions had no effect on either 
decisions or numerical scores. This result suggests that the use 
of relevant questions with general wording in Experiment 1 
probably did not contribute to the poor rates of detection of 
deception . 

This study may also provide some insight into the question of 
motivation. As in Experiment 2, the examinations used In this 
experiment were better discriminators of truth and deception than 
were the examinations given in Experiment 1. The obtained tau C 
of 0.46 in Experiment 3 Indicates that the decisions In 
Experiment 3 accounted for about twice the variance in the 
guilt/innocent criterion as did the decisions in Experiment 1. 
but only about a third as much variance as did the examiners' 
decisions in Raskin et al . (1988) , and about a fourth as much as 
variance as did the blind evaluator in a recent mock crime study 
(Kircher & Raskin, 1988). Further, the mean numerical score from 
the guilty subjects in Experiment 3 was only -2.3. This result 
is closer to zero than would be predicted from either the 
rationale of the control question test, or from most of the 
analog detection of deception literature. There are a number of 
factors that might account for that result, on of which Is the 
lack of explicit reward or punishment associated with the 
relevant questions. It is conceivable that the lack of 
motivation associated with the polygraph examinations' outcome 
could have effected the results of all of the studies in this 
report . 

Experiment 3 found that the GSR was the most useful 
physiological measure. The GSR has been shown to be the most 
useful physiological measure in virtually every published study 
of the detection of deception, yet to date no major numerical 
scoring system has been altered to explicitly take advantage of 
this information. Research is underway at the Defense Polygraph 
Institute exploring ways of modifying the numerical scoring 
system to take optimal advantage of the GSR. 

An interesting methodological finding of Experiment 3 was 
that there was no difference in the accuracy of polygraph 
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examinations given to support troops and to other personnel at 
Fort McClellan. This is an important finding because it suggests 
that support troops are an acceptable subject pool for use in 
detection of deception research. 

The final topic for discussion in Experiment 3 concerns the 
affective descriptors endorsed by subjects in Experiments 1 & 3. 
In general, the subjects in both experiments did not strongly 
endorse the negative descriptors of fear, nervous, tense, 
anxious, guilt, or bored. They did strongly endorse the 
descriptor curious. These results suggest that the affective 
environment induced in these analog studies was not very similar 
to that in the field. To the extent that these studies did not 
re-create the environment of the real world their 
general lzability may be limited. We will return to the issue of 
general 1 zabi 1 i ty in the next section of this report. 

There were some significant differences in the affective 
descriptors endorsed by the subjects in Experiments 1 & 3. The 
subjects in Experiment 3 reported less curiosity and hope but 
more guilt than the subjects in Experiment 1. These results are 
difficult to interpret, but suggest that the subjects found 
Experiment 3 to be relatively more negative than Experiment 1. 
Similarly, the interactions of Quilt and Study for the 
descriptors Hope and Guilt are difficult to interpret, but 
generally seem to indicate that the subjects found Experiment 3 
to be a more negative experience. 
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QEXEBAL DISCUSS IOH 

The research conducted as Experiments 2 and 3 suggests that 
the three methodological issues raised in the discussion of 
Experiment 1 cannot adequately explain the poor detection in 
Experiment 1. Experiment 2 failed to find any problem 
specifically associated with the examination of multiple relevant 
Issues within one question series. Experiment 3 failed to 
Indicate any effect of a time lag between the mock espionage and 
the polygraph examination. Finally, Experiment 3 failed to find 
any problem with using relevant questions that are worded 
generally and presented in a screening examination as compared to 
relevant questions with very specific wording presented in a 
criminal investigative examination. 

The lack of explicit rewards or punishments associated with 
the outcomes of the examinations may make it easier for both 
guilty and innocent subjects to pass the test. In Experiment 1, 
subjects were told that admissions to any real world security 
violations would be adjudicated. Those instruction may have 
increased the power of the control questions, possibly to the 
point of overwhelming relevant questions about the programmed 
scenarios. Those instructions represent a confounding factor in 
the results of Experiment 1. The effects of motivation on the 
detection of deception need to be systematically examined in 
future research. Thus, the studies reported here may have 
overestimated the number of false negative errors and 
underestimated the number of false positive errors in the field. 
Since these uncertainties about the effects of the laboratory 
remain strong generalization of the results of these studies to 
the field is not possible. However, despite those uncertainties 
it seems likely that the these studies accurately reflect trends 
in the real world. 

One way to estimate the generalizabil ity of the results of 
experiments is to use real world outcome rates and a conditional 
probability analysis to map the experimental outcomes on to a 
real world data set. The Department of Defense Polygraph Program 
Report to Congress for Fiscal Year 1886 and the Department of 
Defense Polygraph Program Report to Congress for Fiscal Year 1987 
provide one such data base. During fiscal years 86 and 87, DoD 
components conducted 8599 security screening examinations under 
the congressional test program. Of those 8599 examinations, no 
opinion was rendered on 1 1 cases, 7 were reported as 
inconclusive, 8528 were reported as no deception indicated, and 
53 were reported as deception indicated. All of the cases 
reported as deception indicated were confirmed by confession. 
Most of the reported confessions were to acts classified as 
security violations, rather than to espionage. These data can 
be used as a base for a conditional probability analysis. 

However, the DoD reports make no estimate of the base rate of 
deception. We decided that a rough estimate of the base rate of 
security violation targets could be obtained from Experiment 1. 
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In Experiment 1 the three agencies that stressed gaining 
admissions obtained a real world admission rate of about 20%. 
This is likely to be a conservative estimate of the actual base 
rate for security violations, since it represents only those 
individuals who actually admitted violations. Almost certainly 
there were additional individuals who had committed security 
violations but did not admit them. However, an accurate estimate 
of the actual base rate is not available, and for purposes of 
this discussion we decided to use 29'/. as our base rate of 
deception for a conditional probability analysis of the data from 
the DoD reports 7 . Additionally, in order to simplify the 
analysis we are ignoring no opinion and inconclusive outcomes. 

A conditional probability analysis using the overall results 
from Experiment 1 produces results somewhat similar to those 
predicted by the studies of criminal investigative examinations 3 . 
That is, our conditional probability analysis predicted that 86X 
of the NDI outcomes should be correct, but only 74% of the DI 
outcomes were predicted to be correct. In other words, there 
should be a large number of false positive errors. However, the 
DoD reports do not provide support for this analysis. The above 
analysis predicts 769 01 outcomes, but only 53 DI outcomes were 
reported. This result suggests that the overall estimates of 
accuracy obtained from Experiment 1 are not an accurate 
reflection of screening in the congressional test programs. 

However, a closer examination of the results of Experiment 1 
suggests that the Agency 4 examiners were performing most like the 
examiners who's results are reported by DoD. When we performed a 
conditional probability analysis using the Agency 4 accuracy rates 
from Experiment 1 on the population of 8581 DoD examinations, and 
used a base rate of 20%, we predicted no false positive errors 
and expected to correctly detect 137 guilty individuals 9 . The 
DOD reports found no false positive errors and reported 53 people 
as deceptive. Of course, the important implication from this 
analysis is that it suggests that there were more than a 1500 
individuals who committed security violations, but were cleared 
by the polygraph (however, see Footnote 7). 



T The following points should be considered in evaluating the two conditional 
probability analyses that follow. First, our assumption of a 20% base rate of 
deception is not reasonable for agencies that are not concerned with detecting 
security violations. Agency 4's screening program is directed at acts of 
espionage and sabotage and does not recognize security violations as within 
the scope of their polygraph program, as defined under DoD Directive S210.48. 
Since Agency 4 does not even record the security violations that are reported by 
their subjects, the base rate of targeted deception for their programs must be 
considered to be very low. Second, these conditional probability analyses are 
based on real world data obtained from the congressional test program within 
the Department of Defense. Therefore, the results may not apply to programs 
not included in the DoD reports to congress. 
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If this analysis is correct, then it is important to consider 
why the results of screening tests are so biased toward NDI 
calls. A possible line of reasoning is not difficult to develop. 
The general tone of the scientific testimony before congressional 
committees has been that there inevitably must be a large number 
of false positive errors in mass screening, and this must be so 
unless the discriminator is nearly perfect with innocent 
individuals. Knowing the dire predictions of large numbers of 
false positive errors, It is possible that the individuals who 
set up the extant screening programs built in as many safeguards 
as possible against making false positive errors. However, they 
may have gone too far in protecting against false positive 
errors. We nay have a system that is efficient at avoiding false 
positive errors at the expense of missing the targets it was 
designed to catch. 

Discussions with experienced screening examiners indicates 
that there are a variety of pressures acting upon the examiners 
to clear as many examinees as possible. The primary pressure 
appears to stem from the knowledge that the proportion of actual 
espionage agents within the population being tested is extremely 
small. It is no wonder that if the person taking the test is 
having trouble clearing it, many examiners feel that they (the 
examiners) must be doing something wrong. We have been told that 
sometimes examiners run repeated tests until the physiological 



^Conditional probability analysis assumptions: 

Population of Cases Where a Decision was rendered: 8581 

Base Bate of Quilt: 2BZ, therefore: Number of Guilty * 1716 

and Number of Innocent = 6865 

accuracy rates from the combination of the agencies from 
Experiment 1: Guilty * 34X Correct, Innocent * 87Z Correct 



Predicted Classification Table 

DI NDI Totals 

/ \ 

Guilty ; 583 \ 1133 \ 1716 

! ■» --! 

Innocent I 206 6659 6865 

\ / 

Totals 789 7792 8581 

Confidence in a DI outcome > 74X (583/786) 
Confidence in a NDI outcome * 86X (4228/4978) 
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reactions disappear without any significant admissions having 
been made. Although polygraph program managers try to combat 
that attitude, it is encouraged by other aspects of the system. 
Examiners who clear more subjects than the average examiner are 
often promoted more rapidly. Conversely, examiners who 
consistently fail to clear enough subjects on the first series 
are 'given help*. In such a testing environment, the results of 
Experiment 1 may reflect the condition of the real world. Given 
the astonishingly small number of positive outcomes reported in 
the DoD program, it seems likely that the DoD screening programs 
are missing a lot of security problems. 

The screening situation suggested by the results of these 
studies and analyses is not good, but some positive aspects were 
indicated. The government does obtain the benefit of uncovering 
some security problems. A detection rate of 34% was demonstrated 
across the agencies. Some utility of the polygraph test was 
demonstrated for several of the Agencies in Experiment 1 by the 
substantial number of real world security problems that were 
discovered. Without the polygraph, it is likely that none of the 
problems would have been uncovered. The ability to detect some 
problems is better than detecting none at all. Furthermore, any 
possibility of being caught may deter potential spies and reduce 
security violations. 



Conditional probability analysis II assumptions: 

Population of Caies Where A Decision was rendered: 8581 

Base Bate of Guilt: 20X, tfaerefort: Number of Guilty = 1716 

and Number of Innocent = 6865 

Accuracy rates from Agency 4 In Experiment 1: 

Guilty * 8X Correct, Innocent = 100X Correct 



Predicted Clarification Table 

DI MDI Totals 

/ - - \ 

Guilty : 137 ! 1578 1 1716 

: + ! 

Innocent i i 6863 '. 6865 

\ / 

Totals 237 8444 8581 



Confidence in a 01 outcome > 100X (137/137) 
Confidence in a »DI outcome » BIZ (6865/8444} 
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The security screening problem is difficult, and may be very 
difficult to solve. The situation could be improved by improving 
our ability to detect deception, and by making better use of all 
of the available information. However, given the current state 
of the practice in the field that may be very difficult to 
accomplish. At present, at least four questioning techniques are 
used and none have received sufficient scientific evaluation. 
Further, we were told that in at least one agency chart 
evaluation varies from office to office, and perhaps from one 
quality control officer to another. Different agencies have very 
different perceptions about how examinations should be conducted 
and about what are the appropriate targets of their screening 
programs. Standardization driven by research is needed. 

There are a number of approaches that research could offer to 
improve the situation. Statistical approaches to decision making 
would surely help reduce the unreliability in the current 
systems. Discriminant analysis procedures that make explicit use 
of base rate information are one step that could provide an 
immediate benefit, and they are currently available. Hew 
approaches to analysis of physiological responses also hold 
promise. For example, actuarial decisions can be made on the 
basis of the pattern of physiological responses to relevant 
questions, and those decisions were demonstrated to be more 
accurate than decisions based on numerical scoring in one study 
Honts, Xiroher, and Raskin (1988). New physiological measures 
may Improve our ability to detect deception. However, all of 
these avenues require research, and the need is urgent if the 
results of Experiment 1 tell us anything about the current 
performance of counterintelligence screening examinations. 
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ANOVA -- Analysis of variance. A powerful statistical 

technique used in complex factorial experiments 
to determine if the contribution of the various 
factors (independent variables) and their 
combinations (interactions) to the total 
variation In the results is significantly 
different than that expected by chance. Also 
see t-test and RANOVA. 

Cardio Short for cardi osphygmograph , one of the three 

channels usually recorded by field polygraph 
instruments. It provides a measure of relative 
blood pressure by measuring changes in the 
volume of the upper arm (sometimes the lower) 
by means of a pressure cuff. 

A type of statistical test (named after the 
Greek letter chi) used in this study to 
determine if the number of subjects in the 
various outcomes were distributed by chance. 

CIA — Central Intelligence Agency. 

Correct In lie detection, diagnosing an 'Innocent* 

Rejection person as not deceptive (NDI or HSR) . 

DI Deception Indicated. A polygraph outcome in 

which the examiner concludes that the person is 
deceptive or concealing information. It is 
synonymous with SPR (specific reaction), and 
the opposite of NDI (no deception indicated) 
and NSR (no specific reaction). 

False negative — A polygraph outcome in which a deceptive 

('guilty') person is erroneously diagnosed as 
truthful by the examiner. 

False positive A polygraph outcome in which a truthful 

('innocent') person is erroneously diagnosed as 
deceptive by the examiner. 

FN — See false negative error. 

FP -- See false positive error. 

GSR -- Galvanic skin response. One 

physiological measures usually 
field polygraph instruments. It 
emotional sweating response, 
measurement of GSR used in this 
skin resistance response (SRR) . 



Chi square (X 2 ) -- 



of the three 
recorded by 
represents an 
The specific 
study is the 
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Guilty 



In this report, a guilty person was one who had 
committed a mock crime, such as the theft of a 
mock classified document, and they were 
instructed to lie about their involvement on 
the polygraph. If the examiner concludes he is 
deceptive, the result is a "hit" (true 
positive). If the examiner clears him, the 
result is a "miss" (false negative). The the 
outcome is inconclusive, it is IG (an 
inconclusive outcome on a guilty person). 



Hit 



In lie detection, calling a 
deceptive (DI or SPR) . 



'Guilty* person 



IG 



II 



An inconclusive outcome on a person programmed 

to be guilty or knowledgeable. 

An inconclusive outcome on a person programmed 
to be innocent. 



Inconclusive 



Innocent 



INSCOM 



The outcome of a polygraph examination when the 
examiner is unable to make a decision about a 
person's truthfulness. It is usually not 
considered to be an error. However, in 
screening situations the practical result is 
similar to a DI outcome, in that further 
investigation is required and clearance may be 
withheld pending resolution. 

In this report, a person who was not programmed 
to be guilty or knowledgeable. Some programmed 
innocent persons may in fact not be Innocent if 
they concealed significant 

Information from the polygraph 
Generally, a truthful outcome 
'Innocent' subject is considered 
correct decision (true negative 
rejection). While a deceptive outcome with an 
'Innocent* is generally considered to be an 
Incorrect decision (false positive, false 
alarm). However, If, following a deceptive 
outcome, the programmed innocent person admits 
to concealing real-life Information, that 
outcome may be considered to be a correct 
decision (true positive, hit). 

US Army Intelligence and Security Command. 
See MI . 



real-life 
examiner . 
with and 
to be a 
correct 
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Knowledgeable -- In this report, a person who Is programmed to 

have knowledge of someone who has committed a 
mock crime, but who did not commit the crime 
himself. If the examiner concludes that the 
person is lying or concealing information on 
the test the outcome is considered to be a 
correct decision (hit, true positive). If the 
outcome was NDI It was considered to be an 
incorrect decision (miss, false negative). 

MI -- Military Intelligence. In this report, MI 

refers specifically to the US Array Intelligence 
and Security Command (IMSCOM) and its 
subordinate elements. 

Miss An error of diagnosing a 'Guilty" subject as 

truthful (NDI or NSR) . 

NDI -- No Deception Indicated. A polygraph outcome in 

which the examiner concludes that the person 
was truthful, and was not holding back any 
significant information. Synonymous with NSR. 
The opposite of DI and SPR, 

NSA -- National Security Agency. 

NSR — No specific reaction. NSA examiners use this 

term in preference to NDI, to indicate that a 
person appeared truthful on the polygraph test. 

OSI — The U.S. Air Force Office of Special 

Investigations . 

p < -- statistical notation for 'The probability that 

this result could have occurred purely by 
chance is less than...* In this study, results 
which have probabilities of .03 or less are 
assumed to have been caused by the factor being 
studied, rather than by chance. See 
probabi llty . 

Probability — Probability is a statistic expressed as a 

number ranging from . to 1.0, in which the 
smaller the number, the le3s likely the event. 
A probability of .05 means that there is only 
five chances out of a hundred (one in twenty) 
that a given result could have occurred. 
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RAKOVA 



Repeated Measures Analysis of Variance. A 
special case of ANOVA where one or more of the 
dependent variables is a repeated measure from 
the same subject. An example would be, the 
charts of a polygraph test. RANOVA takes 
statistical advantage of the fact that the 
repeated measures from the same Individual are 
not independent observations. 



SPR 



Specific Reaction. NSA examiners use this term 
in preference to DI , to indicate that a person 
appeared deceptive on the polygraph test. 



tau c 



A non-paramet ic measure of association. For 
practical purposes tau c values can be treated 
as correlation coefficients. 



TN 
TP 

True negative 



See True negative. 
See True positive. 

When a programmed innocent person 
truthful (NDI) on the polygraph, 
with correct rejection. 



is called 
Synonymous 



True positive 



When a programmed guilty or knowledgeable 
person is called deceptive on the polygraph. 



t-test 



A powerful statistical test used to determine 
if the effects of a single independent variable 
that has two levels are significantly different 
than that expected by chance. Mathematically, 
t-tests are a special case of ANOVA. 
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APPENDIX A 
EXPERIMENT 1 FORMS 



CONSENT TOP POLYCRAPH EXAMINATION 



I, . voluntarily consent to 

polygraph testing administered by examiners of the United 
States Government. X understand that polygraph testing and 
periodic retesting ean be required as a condition of ay 
employment with the United States Military. 

The procedures that are to be followed during the 
esaainatlon haws been explained to ae, and X aa aware that the 
procedures will include the use of sensors to record ay physio- 
logical responses to questions. I understand that the questions 
to be asked during the esaainatlon will bs only those questions' 
necessary to resolve security and counterintelligence issues, 
including but not United to specific issuss such as loyalty, 
the eoaproalse of classified lnforastlon, and vulnerability to 
blackaall, and that the questions win be reviewed with ae, at 
least In general, prior to the esaainatlon. X agree to keep the 
details of ths eiaainatlon secret froa all unauthorised person*. 

z understand that any Information relating to violations of 
law or an laainent throat to life or property aay bo reported 
to tho fcttocaoy Qenotal as required by Section S3S or Title 21 
of the gal ted States Code and tsecutlve Order 1*333 or its 
aoeeesaors, and alao aay bo reported to appropriate lav 
enforeeaent or other govoraaent agencies for administrative, 
lBVootlgstlve or legal action. I also anderstsnd that 1 have a 
right against self •Incrimination under the Fifth amendment to 
the Constitution of the Onitod states and that X aay refuse to 
answer a question if ay answer would toad to incrlalnato ae. 

X also have boon briefed that any active duty aeaber of the 
Baited statoa Araed Porcoa oust bo advised during the initial 
protest, prior to signing this eoasont foia, that any violation 
of Article 31/0. CM. J. alght be reported to their respective 
allltary service. 

X understand the session with the polygraph examiner aay bo 
aonitorod and is audio and video recorded for the purpose of 
clarity and accuracy. 1 also understand that the session aay 
bo videotaped for tho purpose of research and training. 

X have road tho foregoing and understand ita Import fully. 

IS WITNESS Minor, X place ay signature below, this , .. day 
of . 1» . 



Tho above was road and signed in ay presence this 
, 1» • 
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PERSONAL DATA FORM 



THIS FORM IS AFFECTED BY THE PRIVACY ACT OF 1974 



1. AUTHORITY: 10 USC 3012, 44 USC 3101 and 10 USC 1071-1087 

2. PRINCIPAL PURPOSE: To plan involvement la a classified scenario. 

3. ROUTINE OSES: The requested information vill be used to tailor the details 
of a classified scenario to those individuals selected for participation. This 
form will be destroyed (a) in the event you sre not selected for a scenario or 
Cb) following tout participation in the ecsnarlo. None of the information vill 
be furnished to anyone net directly involved la the research. 

4. MANDATORY OR VOLUNTARY DISCLOSURE: Disclosure la voluntary. Failure to 
provide the information may result in your being disqualified for participation 
in the study* 



PLEASE PRINT ALL INFORMATION AS LEGIBLY AS POSSIBLE. 



Base: 



Sex: 



Height: 



Veight: 



Race: 



White Black Other 



FOB: 



Duty title: 



Residence address: 



Residence phone: 



Marital statue: Single Harried Other 



Do you have (or have access to) a vehicle? 



Type: 



Make: 



Color: 



Security Screening Polygraph Examination* 



Page 70 

Appendix A 



PAM B — VOLUNTEER AFFIDAVIT 

- ' ■ i • • r- » **ing •* 18 yeera old, 

do hereby -volunteer to participate In a research study entitled "Polygraph 
Screeainf Validation Study" being conducted at the Department of DefenM 
Polygraph Institute at Ft. KcClellan under the direction of Cordon H. Barland, 
Ph.D.. 

The implication* of sty participation; the nature, duration and purpose, and the 
aethods by vbieh it la to be conducted; and Che inconvenience! end hatards to be 
expected have been thoroughly explained to ae ea described above. X have been 
given the opportunity to ask questions concerning this study, and any such 
questions have been answered to ay satisfaction. Should any farther questions 
arise concerning ay rights on study-related injury, I may contact COL Cadol, 
M.D., Director of the Hoble Any Community Hospitsl, Ft. McClcllan, Alabama, 
3«05 (Telephone naabar: 20S/238-22OO). 

X understand that I any at any tlae revoke ay coaeent and vitocrsv from the 
study without prejudice. 



Signature Data 



VltSSM 
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Code nuaber: _____ 
Date: _________ 

POLYGRAPH ATTITUDES QUESTIONNAIRE 

INSTRUCTIONS: Read each sentence until you understand what It being asked* 
Circle the answer which beat describes your attitude. 

1. The polygraph, or "lie detector", la _________ able to tell when ■ 

person is lying. 

a. always b. usually c. soaet lares d. rarely e. never 

2. If I were suspected of s crime which I had actually co— itted, 
I would _______ agree to take a polygraph test. 

a. definitely b. probably c eight d. probably not e. never 

3. If 1 were suspected of a erlae which I had not eon— ltted. 
I would agree to take a polygraph test. 

a. definitely b. probably c. eight d. probably not e. never 

4. If 2 were considered for a governeent job involving access to secret 
Information and were asked to take a preelaarance polygraoh test on 
■y background, I would agree. 

a. definitely b* probably c. night d. probably not e. never 

5. If 1 were being considered for a job In a supermarket Involving access 
to aoney and were asked to take a pre employment polygraph test on ay 
background. I would agree. 

a. definitely b. probably c. aight d. probably not e. never 

6. Uae of the polygraph violates a person's privacy. 

a. never b. rarely c. often d. usually e. always 

7. Uae of the polygraph is unethical. 

a. never b. rarely e. often d. usually e. alwsys 

8. Consents: 
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Code number 



Date: 



POLYGRAPH ACCURACY QUESTIONNAIRE 



INSTRUCTIONS: Answer the following questions with the percentage which 



best describes how you feel. Don't worry If your answers 
are not consistent. Few people are consistent about 
something like the lie detector. We are Interested In 
your initial reaction to the question. Answer the questions 
as rapidly as feasible. 



1. How accurate do you think the polygraph is in general? 



X 



In a murder case? 



X 



With the guilty person? 



X 



With an Innocent person? 



X 



In pre employment screening? 



X 



Vlth someone who's lying? 



X 



With someone who's telling the truth? 



X 



When a person Is lying about which of 5 numbers he picked? 



X 



2. Hew accurate do you think the polygraph would be on yw? 



X 
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1947 SCUOIXNG STUDY 
S DEBUKNK6 FOSM 

INTRO: Your participation In the study la now over. You are free to talk to 
M. Mutt be absolute truth, despite possible prior instructions to contrary. 
Do not discuss v/ friends your role or your ezae until 15 Sep 87. 

S first naae: 

1. Hov did you like your exa»? Was it what you expected? _____ 

What was different? 

2. What was the beat thing about it — the seat Interesting thing? 

3. What was the worst thing about It? 

*• Were Y la the Guilty group, the Knowledge group, or the Innocent group? 
(Cheek w/ our records). C K I 

SCENARIOS (cmrr and OOVUOCBAXU Ss): 

5. Row did you tnjoy jour icanarlo? 

6. Was It reallatlcT 

7. What was the aoat realistic thing about It? 

8. What was the least realistic thing about it? 

9. Hov could it have been Improved? 

10. DYK anyoae who was Cullty/Kaowlcdgeable? Who? Circumstances: 

10a. Old you tall anyone about what you did, prior to the polygraph teat? 

11. Woo elaa kaova what you did? 

12. Does anyone (alee) suspect that you ware guilty (or knew soaaoae who was)? 



a 

Batei.^, 

Agency: 
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S. Debriefing 

CDILTY 

13. What did you do with the anneyf 

Did holding the money for all this tine eauae you any probleas? 

15. Whan can we arrange to rttritva it? 

16. SIR any other equipment that you've not yet turned in?. 
POLYCIAPH (AU): 

17. Rev interesting did you find your polygraph experience? 

18. What vaa tha soar interesting thing about it? 

19. What vaa cht leaat pleaeaat thing about it? 

20. Xf you had tha power and the authority to aak« any change in the 
polygraph teat that you wanted to, what would be the first thing 
that you would change? 

21. BY lie to any of the Qa? Which one(s)f Did any of tha questions trouble 
you? 

21a. was year polygraph teat accurate? 

22. What dee* the esaalner leek fer whan ha 'a deciding whether you ware 
e apy or not? Hew do you suppose tha taet ie graded? 

23. While Che test was la progreea, did you feel yourself react to any 
of the questions? Which onea? 

24. What did you do in order to look as truthful aa possible on the tact? 
2$. 0Y control your breathing? _____ Hew? 

26. What did you think about while you were attached to tha polygraph 
and the questions were being asked? 

26a. Did you try to keep eala? . On just soae of the questions? 

26b. Did you try to look fuilty on any of the questions? 
26c. Did you try to create reactions to any of the questions? 
Which oaeo? 
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& Debriefing 



Mow? 

27. Did any question on the teet take you by eurprlae or catch you off-guard? 
27*. What was your reaction?. 



27b. Vhj do you suppose they were put there? 

28. To what extent do the followng words describe how you felt during the 
actual test? (Scsle of 1-10) 

Fear Of what? • 

Nervousness 

Boredoa 

Tenseness _____ 
Curiosity ' 

Guilt 

Anxiety 

Hope (of not being caught?) . 



29. What single word beat describes how you felt on the actual test? 



30. GVILTY/UKWUDCEAIU: Were you hoping to beat the teat . , 

or were you hoping that your lies would be detected ? 

31> To what extent did you feel your polygraph examination was "for real?" 
CW0> 



32. To what extent did you feel it was just a gaae? (1-10) 

33. Were you als treated In any way by the examiner? 

34. Vould you be willing to volunteer for another polygraph test on the 
next research study we do? _______ 

35. Is there anything else you'd like to mention? 



NOTE: Have S fill out 3 Questionnaires: Teat Program, pg accuracy, 
and pg attitudes. 

A couple of south* free now we need to have you fill out these three 
quest!' 3.1 res one sere tin* in order to see whether any changes that occurred 
are short-ten or long-term ehaagea. These for** will be -ailed to you. What 
address will you be at two months from now? 
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APPENDIX B 
AGENCY EVALUATION QUESTIONNAIRES 

BLIND EVALUATION Date 

SUBJECT CODE NUMBER _ ETYPE „ __ Evaluator 



PLEASE READ THIS ENTIRE FORM BEFORE YOU VIEW ANY EXAMINATION TAPES! 

We want you to give us as objective an opinion as possible about the 
examination you are going to watch. We realize that some of the judgments we 
are asking you to make are difficult ones, and that this Is a tedious task. 
However, your Job is a very important one in helping us to understand the 
results of this study. Please do the best you can, and please give us a 
response to every rating scale item, even if you are not as sure as you would 
like to be about your response. 

Please watch the pretest interview for this examination. Take your time and 
observe the examination carefully. Take notes in the space provided. If you 
need additional space for notes please use the backs of these pages or attach 
additional sheets. We are particularly Interested in any significant 
differences between this examination and the way your agency conducts 
examination in the field. After you have finished watching the pretest 
interview, and before you watch the remainder of the examination, answer the 
questions following the notes section. 

Pretest interview notes _ 



Questions: Please indicate the response that most closely expresses your 
opinion by circling the number and writing the number in the blank at the left. 

In general, how much was this pretest like a pretest conducted by 

your agency In the field? 

1 2 3 4 5 6 7 

Not at all like the field Just like the field 

With regard to the length of the pretest, was this pretest longer, 

shorter, or the same length as a typical pretest conducted by your 
agency in the field? 



1 2 3 4 5 « 7 

Shorter About the Same Longer 
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Was th« examiner's description of the polygraph and the physiology 
of the detection of deception typical of those given by your agency 
in field examinations? 

1 2 3 4 5 6 7 

Less than the field About the Same More than the field 

Were admonitions about movement and/or breathing about the same as, 
stronger, or weaker than those given in the field? 

1 2 3 4 5 6 7 

Weaker Than the Field Same as the Field Stronger Than the Fie 



Was the presentation and definition of the RELEVANT questions 
similar or dissimilar to the definition and presentation used by 
your agency in the field? 

1 2 3 4 3 6 7 

Not at All Like The Field Just like the field 

If you felt that the definition and presentation of the BIISVAMT 
questions was different from that used by your agency in the field 
please tell us about those differences. 



Was the emphasis placed on the BELEVAHT questions in this pretest 

typical, less, or more than the emphasis placed on relevant 
questions during field examinations conducted by your agency? 

1 2 3 4 5 6 7 

Less emphasis About the Same More Emphasis 

If you felt that the amount of emphasis placed on the BELETAST 
questions was different from that used by your agency In field 
examinations, please tell us about the differences. 
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Was the presentation and definition of the CONTBOL questions similar 

or dissimilar to the definition and presentation used by your agency 
in the field? 

1 2 3 4 5 6 7 

Mot at all like the field Just like the field 

If you felt that the definition and presentation of the CONTROL 
questions was very different from that used by your agency in the 
field, please tell us about those differences. 



Was the emphasis placed on the CONTROL questions In this pretest 

typical, less, or more than the emphasis placed on relevant 
questions during field examinations conducted by your agency? 

1 2 3 4 5 7 

Less emphasis About the Same More Emphasis 

If you felt that the amount of emphasis placed on the CONTBOL 
questions was very different from that used by your agency in field 
examinations, please tell us about the differences. 



Based on your observation of this pretest interview, if the subject 

was actually GUILTY do you think this pretest interview would 
produce an accurate or Inaccurate outcome? 

1 2 3 4 5 6 7 

Very Inaccurate Very Accurate 

Briefly, why do you feel the way you do? 
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Based on your observation of this pretest interview, if the subject 
was actually INNOCENT do you think this pretest interview would 
produce an accurate or inaccurate outcome? 

1 2 3 4 5 6 7 

Very Inaccurate Very Accurate 

Briefly, why do you feel the way you do? 



Given your observation of the subject's behavior and statements, do 
you believe the subject to be guilty or Innocent? (If you think the 
subject has guilty knowledge consider him/her to be guilty.) 

12 3 4 5 6 7 

Guilty Innocent 

Please give us any general comment! you may have about this pretest 
examination. 
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Vow please watch the Intervals between the charts, and take notes If you wish. 



Please answer the following questions by Indicating the response that most 
closely expresses you opinion by circling the number and writing the number in 
the blank at the left. 



In the between chart interval, were the RELEVANT questions 

discussed, and if they w*r«, did the discussion emphasize the 
RELEVANT questions more, less, or about the same as they would be In 
a typical examination conducted by your agency? 

1 2 3 4 5 8 7 

Not discussed Less emphasis about the same More emphasis 

If you responded "Not discussed" Is that standard practice for your 

agency? YES NO 

In the between chart interval, were the CONTSOL questions discussed, 

and if they were, did the discussion emphasize the CONTROL questions 
more, less, or about the same as they would be in a typical 
examination conducted by your agency? 

1 2 3 4 5 6 7 

Not discussed Less emphasis about the same More emphasis 

If you responded 'Not discussed* is that standard practice for your 

agency? YES NO 

Please let us have any additional comments you may have on this examination. 



NONBLIND EVALUATION Date 

SUBJECT CODE NUMBER ETYFE Evaluttor 
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Examination outcome Correct/ Incorrect Scenario 



This subject produced a polygraph out cone as Indicated. We want you to give us 
as objective an opinion as Is possible about this examination, and why its 
outcome was correct or incorrect. We realise that some of the judgments we are 
asking you to make are difficult ones, and that this is a tedious task. 
However, your Job is a very important one in helping us to understand the 
results of this study. Please do the best you can, and please give us a 
response to every rating scale Item, even if you are not as sure as you would 
like to be about your response. 

Please watch the pretest interview for this examination. Take your time and 
observe the examination carefully. Take notes in the space provided. If you 
need additional space for notes please use the backs of these pages or attach 
additional sheets. We are particularly interested in any significant 
differences between this examination and the way your agency conducts 
examinations In the field. We are also very interested in any insight you can 
provide about why the polygraph worked or did not work in this case. After you 
have finished watching the pretest interview, and before you watch the 
remainder of the examination, answer the questions following the notes section. 

Pretest interview notes _ 



Question*: Please indicate the response that most eloaely expresses your 
opinion by circling the number and writing the number in the blank at the left. 

In general, bow much was this pretest like a pretest conducted by 

your agency in the field? 

1 2 3 4 5 6 7 

Sot at all like the field Just like the field 

With regard to the length of the pretest, was this pretest longer, 

shorter, or the same length as a typical pretest conducted by your 
agency in the field? 



1 2 3 4 5 6 7 

Shorter About the Same Longer 
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Was the examiner's description of the polygraph and the physiology 

of the detection of deception typical of those given by your agency 
in field examinations. 

1 2 3 4 5 6 7 

Less than the field About the Same More than the field 



Was the presentation and definition of the RELEVANT questions 

similar or dissimilar to the definition and presentation used by 
your agency in the field? 

1 2 3 4 5 6 7 

Not at All Like The Field Just like the field 

If you felt that the definition and presentation of the BELEVAJT 
questions was different from that used by your agency in the field 
please tell us about those differences. 



How well did the RELEVANT questions cover the subject's actions in 

the scenario? 

1 2 3 4 5 6 7 

Hot Covered at All Covered Completely 



Was the emphasis placed on the RELEVANT questions in this pretest 

typical, less, or more than the emphasis placed on relevant 
questions during field examinations conducted by your agency? 

1 2 3 4 5 7 

Less emphasis About the Same More Emphasis 

If you felt that the amount of emphasis placed on the RELEVANT 
questions was different from that used by your agency in field 
examinations, please tell us about the differences. 
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Was the presentation and definition of the COHTBOL questions aimilar 
or dissimilar to the definition and presentation used by your agency 
in the field? 

1 2 3 4 5 6 7 

Not at ail like the field Just like the field 

If you felt that the definition and presentation of the CONTROL 
questions was very different from that used by your agency in the 
field, please tell us about those differences. 



Did the control questions overlap the subject's actions in the 
scenario? 

1 2 3 4 5 6 7 

Complete Overlap No Overlap 

Was the emphasis placed on the C0NT80L questions in this pretest 

typical, less, or more than the emphasis placed on relevant 
questions during field examinations conducted by your agency? 

1 2 3 4 5 6 7 

Less emphasis About the Same More Emphasis 

If you felt that the amount of emphasis placed on the COHTBOL 
questions was very different from that used by your agency in field 
examinations, please tell us about the differences. 



Now please watch the intervals between the charts, and take notes if you wish. 
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Please answer the following questions by Indicating the response that moat 
closely expresses you opinion by circling the number and writing the number in 
the blank at the left. 

Were admonition about movement and/or breathing about the same as, 
stronger, or weaker than those given in the field? 

1 2 3 4 3 8 7 

Weaker Than the Field Same as the Field Stronger Than the Field 

In the between chart interval, were the RELEVANT questions 
discussed, and if they were, did the discussion emphasize the 
RELEVANT questions more, less, or about the same as they would be in 
a typical examination conducted by your agency? 

1 2 3 4 5 8 7 

Not discussed Less emphasis About the same More emphasis 

If you responded "Not discussed' is that standard practice for your 
agency? YES KO 

In the between chart interval, were the COITBOL questions discussed, 
and if they were, did the discussion emphasize the CONTROL questions 
more, less, or about the same as they would be in a typical 
examination conducted by your agency? 

1 2 3 4 9 8 7 

Mot discussed Less emphasis About the same More emphasis 

If you responded "Not discussed" Is that standard practice for your 

agency? YES NO 

Given that the outcome of this examination was correct/Incorrect, why do you 
think It turned out the way it did? 
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APPENDIX C 
EXPERIMENT 1 QUESTIONNAIRE RESULTS 

Affective Descriptor* 

Mean affective descriptors given by subjects on the 
Debriefing questionnaire (Appendix A) following their polygraph 
examinations are summarized in Table 17. Differences between the 
Innocent and Qui 1 ty/Knowledgeabl e subjects were tested with 
paired measurements t- tests . and the conditions were found to 
differ on the descriptors nervous, t_ (203) = 2.68, j> < 0.01, 
tense, t (203) = 2.44, £ < 0.05, guilty, t (203) * 7.12, d_ < 
0.001, and anxious, t (203) =4.03, p, < 0.001. 



Table 17. Mean Affective Descriptors Given by Innocent and Guilty 
Subjects Following Their Polygraph Examinations. 



Descriptor 


Innocent 


Oui lty 


Nervous 


4 . 37 


5.35* 


Tense 


4 .06 


4. 98* 


Oui lty 


2.03 


4.66* 


Anxious 


4.02 


5.64* 


Fearful 


3.36 


3.70 


Bored 


3.48 


3.57 


Curious 


6. 13 


8.57 


Hopeful 


6.86 


6.66 



"Indicates a significant difference between Innocent and Guilty 
conditions . 



Data Reduction for the Attitude Questionnaire (Appendix A) 

Questions were coded so that pro-polygraph responses were 
given scores of 1 or 2 and anti-polygraph responses were given 
responses of 4 or 5, neutral responses were coded as a 3. For 
example, answering 'always' (Choice a) to the question "The 

polygraph is able to tell when a person is lying" 

would score a 1, whereas choosing 'never' (Choice e) would score 
a 5. 
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Attitude Questionnaire (Appendix A) 

Subjects' responses to the seven questions of the Attitude 
Questionnaire given before the subjects' examinations were 
subjected to a discriminant analysis to determine if the 
polygraph outcomes (Correct, Incorrect, Inconclusive) could be 
predicted from existing attitudes. The analysis failed to find a 
significant discriminant solution. That is, no response to any 
question, or responses to any combination of questions, predicted 
the outcomes of subsequent polygraph examinations. 

The responses to the same questions during the post-test and 
follow-up administrations of the Attitude Questionnaire were 
analyzed with a Question (7) X Time (Post-Test, Follow-Up) X 
Outcome (Correct, Incorrect, Inconclusive) X Condition (Innocent, 
Guilty) RANOVA. That analysis revealed significant main effects 
for Outcome, F (2, 112) = 5.61, p. < 0.005, and Question, F (6, 
672) * 5.25, p_ < 0.001. There was a significant interaction of 
Outcome and Question, F (12,672) * 1.99, p_ < .023. The mean 
responses for question and outcome are shown in Table 18. .pa 



Table 18. Mean responses to attitude questionnaire by test 
outcome collapsed across Post-Test and Follow-Up 
administrations . 

Outcome 

Question Correct Incorrect Inconclusive 

Number Mean S.D. (M) Mean S.D. (N) Mean S.D. (N) 



1. 2.0 .36 (71) 

2. 1.9 .69 (71) 

3. 2.0 1.0 (71) 

4. 1.6 .62 (71) 

5. 1.9 .83 (71) 

6. 2.6 .06 (71) 

7. 2,2 .59 (69) 



2 


.3 


.59 


(43) 


2. 


2 


.37 


(8) 


1 


.9 


.66 


(43) 


1. 


9 


.95 


(8) 


2 


. 4 


1.3 


(43) 


2. 


1 


.86 


(8) 


1 


.9 


.85 


(43) 


1 . 


8 


.84 


<B) 


2 


.2 


.79 


(43) 


2. 


1 


.62 


(8) 


2 


.4 


.63 


(43) 


2. 


2 


.53 


(8) 


2 


. 1 


.73 


(41) 


2. 





.71 


(8) 



Possible changes in perceptions of test accuracy between the 
Pre-Test and the Post-Test administrations of the Attitude 
Questionnaire were tested with a Question (7) X Time (Pre-Test, 
Post-Test) X Outcome (Correct, Incorrect, Inconclusive) RANOVA. 
The hypothesis of primary Interest was to examine whether a 
correct or incorrect outcome would Interact with the subjects' 
perceptions of the polygraph accuracy. There was a significant 
Outcome by Question interaction, F (12,1164) « 1.82, p. < .04. 
but there were no effects or interactions associated with the 
Time factor. The means of the seven questions collapsed across 
the Pre-Test and Post-Test Administrations of the Attitude 
questionnaire are presented by outcome are shown in Table 19. 
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Table 19. Mean responses to Attitude Questionnaire items by test 
outcome collapsed across Pre- and Post-Test 
administrations. 

Outcome 

Question Correct Incorrect Inconclusive 

Number Mean S.D. (N) Mean S.D. (N) Mean S.D. (N) 



1. 2.1 .39 (130) 

2. 2.0 .75 (132) 

3. 1.9 .89 (132) 

4. 1.5 .58 (132) 

5. 1.9 .89 (132) 

6. 2.4 .82 (130) 

7. 2.0 .62 (129) 



2. 


2 


.42 


(58) 


2. 


3 


.37 


(14) 


1. 


9 


.69 


(59) 


1 . 


9 


.79 


(14) 


2. 


1 


.87 


(59) 


2. 


5 


1 . 1 


(14) 


1 . 


6 


.61 


(59) 


1 . 


9 


. 91 


(14) 


2 . 





.75 


(59) 


2. 


2 


. 95 


(14) 


2. 


3 


. 65 


(59) 


2 . 


1 


.91 


(14) 


2. 





. S2 


(56) 


2. 





1.0 


(14) 



Another issue tested in the Attitude Questionnaire data was 
whether or not certain tests (i.e., those containing control 
questions) were perceived as more intrusive than others. 
Question Number Six (How often does the polygraph violate a 
person's privacy?) at Time 2 was used as the dependent variable 
and Agency was a grouping variable for an ANOVA . This analysis 
found no difference between agencies in subjects' perceptions of 
how often the polygraph violates a person's privacy. The mean 
response to Question Six for the four agencies is shown in Table 
20. 



Table 20. Mean Post-Teat response to Question Six by agency. 



Agency Mean S.D. N 

MI 2.5 1.1 57 

OSI 2.4 .77 51 

CIA 2.5 1.1 44 

NSA 2.3 .96 51 



Percentage Questionnaire (Appendix A) 

This questionnaire asked for percentage estimates of 
polygraph accuracy in various situations with different kinds of 
examines. The nine questions of the Percentage Questionnaire 
were subjected to discriminant analysis to determine if the 
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polygraph outcomes (correct, incorrect, inconclusive) could be 
predicted from existing attitudes. There was no significant 
discriminant solution. That is, no response to any question or 
any combination of responses to the questions, could predict test 
outcomes . 

Subjects' percentage estimates of polygraph accuracy for 
various situations were analyzed in a Question (9) X Time (Pre-, 
Post-, and Follow-Up) X Outcome (Correct, Incorrect, 
Inconclusive) RANOVA. There was a significant Outcome effect, F 
(2, 110) - 4.64, p_ < -012, and a significant Outcome by Question 
interaction, F ( 16 , 880) =1 . 70 ,p= . 04 1 . The means for the nine 
questions collapsed across administrations are shown in Table 21 
by outcome . 



Table 21. Mean responses to Percentage Questionnaire by test 
outcome collapsed across administrations. 



Question 
Number 



Correct 
Mean S . D . 



Outcome 

Incorrect 
(H) Mean S.D. 



Inconclusive 
(N) Mean S.D. (K) 



1 . 


84 


.8 


11 . 


7 


(02) 


74. 


5 


17. 


4 


(21) 


85.9 


10.8 


(7) 


2. 


84 


.9 


12. 


7 


(93) 


76. 


6 


17. 


8 


(21) 


88. 1 


11.6 


(7) 


3. 


84 


.6 


13 . 


5 


(93) 


76. 


5 


18. 


5 


(21) 


89. 1 


13.5 


(7) 


4. 


82 


.0 


15. 





(92) 


72. 


2 


17. 





(21) 


76.0 


16.9 


(7) 


5. 


80 


.3 


14. 





(92) 


72. 


9 


17. 


5 


(21) 


74. 2 


11.3 


(7) 


6. 


83 


.7 


12. 


7 


(93) 


75. 


3 


18. 


5 


(21) 


83.2 


10.8 


(7) 


7. 


83 


. 2 


14 . 


1 


(93) 


73. 


3 


74. 





(21) 


74.0 


17.6 


(7) 


8. 


87 


. 1 


12. 





(91) 


75. 


1 


20. 


1 


(21) 


87.0 


12.0 


(7) 


0. 


86 


.7 


12. 


5 


(90) 


74. 


1 


17. 


7 


(21) 


90.6 


8.7 


(6) 



These results suggest that those who had incorrect outcomes rated 
the polygraph as less accurate than those with Inconclusive or 
correct results. There was no effect for time or Interaction 
between outcome and time. 

Another theoretical question addressed with the Percentage 
Questionnaire data was the difference in perception of polygraph 
accuracy between others (Question One) and the subjects 
themselves (Question Vine). A paired t-teat was used to compare 
the percentage estimates of polygraph accuracy 'in general' and 
'on me (the subject)', and the difference was significant, t 
(201) = -4.37. £ < .001. The m**na for the first and ninth 
questions were 81 and 84, the standard deviations were 16.6 and 
17.1, respectively. 



